Code Violation: Other Than That, How Was the Play, Mrs. Lincoln?
By far the most important model in the world has been the Imperial College epidemiological model. Largely on the basis of the predictions of this model, nations have been locked down. The UK had been planning to follow a strategy very similar to Sweden’s until the Imperial model stampeded the media, and then the government, into a panic. Imperial predictions regarding the US also contributed to the panicdemic in the US.
These predictions have proved to be farcically wrong, with deaths tolls exaggerated by one and perhaps two orders of magnitude.
Models only become science when tested against data/experiment. By that standard, the Imperial College model failed spectacularly.
Whoops! What’s a few trillions of dollars, right?
I was suspicious of this model from the first. Not only because of its doomsday predictions and the failures of previous models produced by Imperial and the leader of its team, Neil Ferguson. But because of my general skepticism about big models (as @soncharm used to say, “all large calculations are wrong”), and most importantly, because Imperial failed to disclose its code. That is a HUGE red flag. Why were they hiding?
And how right that was. A version of the code has been released, and it is a hot mess. It has more bugs than east Africa does right now.
This is one code review. Biggest take away: due to bugs in the code, the model results are not reproducible. The code itself introduces random variation in the model. That means that runs with the same inputs generate different outputs.
Are you fucking kidding me?
Reproducibility is the essence of science. A model whose predictions can not be reproduced, let alone empirical results based on that model, is so much crap. It is the antithesis of science.
After tweeting about the code review article linked above, I received feedback from other individuals with domain expertise who had reviewed the code. They concur, and if anything, the article understates the problems.
Here’s one article by an interlocutor:
The Covid-19 function variations aren’t stochastic. They’re a bug caused by poor management of threads in the code. This causes a random variation, so multiple runs give different results. The response from the team at Imperial is that they run it multiple times and take an average. But this is wrong. Because the results should be identical each time. Including the buggy results as well as the correct ones means that the results are an average of the correct and the buggy ones. And so wouldn’t match the expected results if you did the same calculation by hand.
As an aside, we can’t even do the calculations by hand, because there is no specification for the function, so whether the code is even doing what it is supposed to do is impossible to tell. We should be able to take the specification and write our own tests and check the results. Without that, the code is worthless.
I repeat: “the code is worthless.”
Another correspondent confirmed the evaluations of the bugginess of the code, and added an important detail about the underlying model itself:
I spent 3 days reviewing his code last week. It’s an ugly mess of thousands of lines of C (not C++). There are hundreds of input parameters (not counting the fact it models population density to 1km x 1km cells) and 4 different infection mechanisms. It made me feel quite ill.
Hundreds of input parameters–another huge red flag. I replied:
How do you estimate 100s of parameters? Sounds like a climate model . . . .
The response:
Yes. It shares the exact same philosophy as a GCM – model everything, but badly.
I recalled a saying of von Neumann: “With four parameters I can fit an elephant, with five I can make him wiggle his trunk.” Any highly parameterized model is IMMEDIATELY suspect. With so many parameters–hundreds!–overfitting is a massive problem. Moreover, you are highly unlikely to have the data to estimate these parameters, so some are inevitably set a priori. This high dimensionality means that you have no clue whatsoever what is driving your results.
This relates to another comment:
No discussion of comparative statics.
So again, you have no idea what is driving the results, and how changes in the inputs or parameters will change predictions. So how do you use such a model to devise policies, which is inherently an exercise in comparative statics? So as not to leave you in suspense: YOU CAN’T.
This is particularly damning:
And also the time resolution. The infection model time steps are 6 hours. I think these models are designed more for CYA. It’s bottom-up micro-modelling which is easier to explain and justify to politicos than a more physically realistic macro level model with fewer parameters.
To summarize: these models are absolute crap. Bad code. Bad methodology. Farcical results.
Other than that, how was the play, Mrs. Lincoln?
But it gets better!
The code that was reviewed in the first-linked article . . . had been cleaned up! It’s not the actual code used to make the original predictions. Instead, people from Microsoft spent a month trying to fix it–and it was still as buggy as Kenya. (I note in passing that Bill Gates is a major encourager of panic and lockdown, so the participation of a Microsoft team here is quite telling.)
The code was originally in C, and then upgraded to C++. Well, it could be worse. It could have been Cobol or Fortran–though one of those reviewing the code suggested: “Much of the code consists of formulas for which no purpose is given. John Carmack (a legendary video-game programmer) surmised that some of the code might have been automatically translated from FORTRAN some years ago.”
All in all, this appears to be the epitome of bad modeling and coding practice. Code that grew like weeds over years. Code lacking adequate documentation and version control. Code based on overcomplicated and essentially untestable models.
But it gets even better! The leader of the Imperial team, the aforementioned Ferguson, was caught with his pants down–literally–canoodling with his (married) girlfriend in violation of the lockdown rules for which HE was largely responsible. This story gave versimilitude to my tweet of several days before that story broke:
"People he'd allowed to see his code." "Hey babe, want to swing around my place and see my code?" https://t.co/ZFWuNg7Upt
— streetwiseprof (@streetwiseprof) April 25, 2020
It would be funny, if the cost–in lives and livelihoods irreparably damaged, and in lives lost–weren’t so huge.
And on such completely defective foundations policy castles have been built. Policies that have turned the world upside down.
Of course I blame Ferguson and Imperial. But the UK government also deserves severe criticism. How could they spend vast sums on a model, and base policies on a model, that was fundamentally and irretrievably flawed? How could they permit Imperial to make its Wizard of Oz pronouncements without requiring a release of the code that would allow knowledgeable people to look behind the curtain? They should have had experienced coders and software engineers and modelers go over this with a fine-tooth comb. But they didn’t. They accepted the authority of the Pants-less Wizard.
And how could American policymakers base any decision–even in the slightest–on the basis of a pig in a poke? (And saying that it is as ugly as a pig is a grave insult to pigs.)
If this doesn’t make you angry, you are incapable of anger. Or you are an idiot. There is no third choice.
Craig Pirrong's Blog
- Craig Pirrong's profile
- 2 followers

