Imperial Should Have Called Winston Wolf

In the film Pulp Fiction, moronic hoodlums Jules (Samuel L. Jackson) and Vincent (John Travolta) pick up a guy who had stolen a briefcase from the back of their boss Marcellus Wallace’s car. While driving him away, Vincent accidentally shoots him, leaving the back of the car splattered with blood and brains. In a panic, they drive to friend Jimmy Dimmick’s (Quentin Tarantino’s) house. Dimmick tells them his wife will be home in an hour and they can’t stay. In a panic they call Wallace, who calls in Winston Wolf. Wolf says: “It’s an hour away. I’ll be there in 10 minutes.” In 9 minutes and 37 seconds, Wolf’s car squeals to a halt in front of Jimmy’s house. Wolf rings the doorbell, and when Jimmy answers, Wolf says: “I’m Winston Wolf. I solve problems.” Within 40 minutes, Wolf solves Jules’ and Vincent’s problem. The car is cleaned up with the body is in the trunk, ready to be driven to the wrecking yard to be crushed.





The Imperial team that relied on Microsoft/Github to fix its code should have called Winston Wolf instead, because MS/Github left behind some rather messy evidence. “Sue Denim,” who wrote the code analysis I linked to yesterday, has a follow up describing what Not Winston Wolf left behind:






The hidden history. Someone realised they could unexpectedly recover parts of the deleted history from GitHub, meaning we now have an audit log of changes dating back to April 1st. This is still not exactly the original code Ferguson ran, but it’s significantly closer.





Sadly it shows that Imperial have been making some false statements.





ICL staff claimed the released and original code are “ essentially the same functionally ”, which is why they “do not think it would be particularly helpful to release a second codebase which is functionally the same”.

In fact the second change in the restored history is a fix for a critical error in the random number generator. Other changes fix data corruption bugs (another one), algorithmic errors, fixing the fact that someone on the team can’t spell household, and whilst this was taking place other Imperial academics continued to add new features related to contact tracing apps.

The released code at the end of this process was not merely reorganised but contained fixes for severe bugs that would corrupt the internal state of the calculations. That is very different from “essentially the same functionally”.The stated justification for deleting the history was to make “ the repository rather easier to download ” because “the history squash (erase) merged a number of changes we were making with large data files”. “We do not think there is much benefit in trawling through our internal commit histories”.

The entire repository is less than 100 megabytes. Given they recommend a computer with 20 gigabytes of memory to run the simulation for the UK, the cost of downloading the data files is immaterial. Fetching the additional history only took a few seconds on my home WiFi.

Even if the files had been large, the tools make it easy to not download history if you don’t want it, to solve this exact problem.



I don’t quite know what to make of this. Originally I thought these claims were a result of the academics not understanding the tools they’re working with, but the Microsoft employees helping them are actually employees of a recently acquired company: GitHub. GitHub is the service they’re using to distribute the source code and files. To defend this I’d have to argue that GitHub employees don’t understand how to use GitHub, which is implausible.





I don’t think anyone involved here has any ill intent, but it seems via a chain of innocent yet compounding errors – likely trying to avoid exactly the kind of peer review they’re now getting – they have ended up making false claims in public about their work.






My favorite one is “a fix for a critical error in the random number generator.” In 2020? WTF? I remember reading in 1987 in the book Numerical Recipes by  William H. Press, Saul A. Teukolsky, William T. Vetterling and Brian P. Flannery a statement to the effect that libraries could be filled with papers based on faulty random number generation. (I’d give you the exact quote, but the first edition that I used is in my office which I cannot access right now. Why is that, I wonder?). And they were using a defective RNG 33 years later? Really?





“Algorithmic errors” is another eye popper. The algorithms weren’t doing what they were supposed to?





Read the rest. And maybe you’ll conclude that this was a mess that even Winston Wolf could have cleaned up in 40 days, let alone 40 minutes.

 •  0 comments  •  flag
Share on Twitter
Published on May 11, 2020 14:09
No comments have been added yet.


Craig Pirrong's Blog

Craig Pirrong
Craig Pirrong isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.
Follow Craig Pirrong's blog with rss.