Brute force beats premature optimization

Symbols in SRC-SCCS Automatons, judgment amplifiers, and ...

Brute force beats premature optimization

I made a really common and insidious programming mistake recently. I’m going to explain it in detail because every programmer in the world needs the reminder not to do this, and I hope confessing that even “ESR” falls into such a trap will make the less experienced properly wary of it.

Our sutra for today expounds on the sayings of the masters Donald Knuth and Ken Thompson, who in their wisdom have observed “Premature optimization is the root of all evil” and “When in doubt, use brute force.”

My main side project recently has been SRC, a simple version-control system for small projects. One of the constrainta I was thinking about when I designed SRC was that the status command – the one that gives you a display saying which files have been modified since they were checked in – needs to be really fast.

The reason it needs to be fast is that front-ends for CLIs like SRC’s tend to rely on it heavily and call it often. This is, in particular, true of the VC front end in Emacs (which, as it happens, is also my code) – if the status command is slow, VC will be laggy.

It turns out that all file status checks except one only have to look at inodes

(existence/nonexistence, file size) and so are as fast as you can get if you’re going to hit the filesystem at all. The exception is modified/unmodified status – whether your workfile has been edited since it was checked out.

My original plan was just to do a stat(2) on both the master and the workfile and compare modification dates. That too would only touch the i-node and be fast, but there are a couple of problems with it.

One is that stat(2) times only have 1-second resolution, so a workfile could look unmodified to “src status” even if had been changed within a second of checkout. That’s not good.

The other problem is that, having gotten used to git and to build systems like waf and scons that evaluate “modified” based on content rather than last-write-time, I’ve gotten to really like that feature and wanted it in SRC.

OK, so the naive brute-force way to do this would be to read out the last stored version from RCS and compare it byte-by-byte to the workfile. That’s easy to do in Python, where one read call can pull the whole file into memory as a string.

I didn’t do that, which was probably my first wrong turn. Because of recent experience with repository conversions, I was mentally fixated on very large test loads – my gut told me to avoid the overhead of reading out the last stored version from RCS every time I wanted to check status, and instead do the obvious trick with a content hash. That is, keep a hash of the last stored version, hash the workfile on each status check, and compare the two.

Have you already spotted my error here? I didn’t measure. Before time-optimizing, I should have coded the naive version, then timed a bunch of status checks to see if there was any noticeable wall-time lag.

I say “probably” my first wrong turn because I quickly coded a hash-based implementation that worked without fuss, so I got away with it for a while. But…

…the way I did it was inelegant, and that bothered me. At that time I couldn’t think of a reasonable way to store the hash in the RCS master, so I dropped it outboard in a stamp file. That is, for each RCS master foo,v there was a parallel foo.srcstamp that existed just to hold the content hash from the last checkin.

I didn’t like this. I wanted each unit of history to correspond to one unit of storage, preferably to an eyeball-friendly flat file. If you do not understand why I wanted this, sigh…go slam your noggin against a copy of The Art of UNIX Programming until you achieve enlightenment.

Then I wrote an SCCS back end – and had the kludgy idea that I could recruit the description field in an SCCS master as a key-value store to simulate the named symbols that RCS/SRC has but SCCS doesn’t.

I still haven’t done that. But I wrote trial implementations in the RCS and SCCS back ends that gave me that general key-value store – RCS also has a description field that can be hijacked. Tested them and they worked.

The inevitable occurred. I thought: “Hey, why don’t I put the checkin hash in the key-value store, and get rid of those ugly stamp files?”

So I coded it, and it passed my regression-test suite – though the function where all the hash-check magic was taking place, modified(), seemed ugly and unpleasantly complex.

I shipped this as 1.8 – and promptly got a report that status checking was broken, A few minutes of repeating the test sequence I’d been given with increased instrumentation established that the problem was somewhere in the logic of modified().

I stared at modified() for a while, tried to fix to fix it, and failed. Then I had a rush of sense to the head, replaced the guts of modified() with the stupidest possible brute-force content comparison, and shipped that as 1.9.

Do you see what happened here? I got mugged by complexity creep – ended up with code I couldn’t mentally model properly, afflicted with tricky edge cases, and it broke. My own damn fault, because I optimized before I measured and one thing led to another.

Don’t do this. Well, not unless you really want Master Foo to materialize and whack you upside the head with his shippei (you should be so lucky).

Now I’m going to do what I should have done in the first place – not write the fscking optimization until someone tells me the stupid-simple method actually causes a performance problem. And, you know, it may not ever. Seek-induced latency is not the scourge it once was; SSDs are wonderful things that way.

Furthermore, the thing I should have remembered is that RCS optimizes for retrieving recent versions by storing delta backwards from a whole-text tip version, rather than forwards from the initial version as SCCS does. So under RCS tip checkout is mostly a copy from a span of the master file eather than the longest possible sequence of change-delta integrations, and probably reasonably fast.

The only scar tissue left is that I’ve still got about 40 lines of now-unused key-value store implementation in the code – sitting quietly, doing nothing. I’m sure it’ll come in useful someday, especially if someone turns up with an actual need for named symbols under the SCCS back end.

I shall regard it as a test of my discipline and detachment, not to do anything with it until I actually need to.

Thus, we gain merit by not coding.

View more on Eric S. Raymond's website »

Like • 0 comments • flag