Progress towards the extinction of CVS

The Great Beast, designed for converting large CVS repos, is now in full production. It hasn’t killed off any specimens in the wild yet (and I’ll explain why in a bit), but it’s doing spectacularly well on our test repositories.


As a representative large example, the entire Emacs CVS history, 1985-2009, 113309 CVS commits, lifts clean in 37 seconds at a sustained rate of 3K CVS commits a second. Yes, three thousand.


The biggest beast known to us, the NetBSD src repository, converts in 22 minutes. To give some idea of what a speedup this is, the first time I ran a lift on it – on one of Wendell’s Xeon machines – it took a bit under six hours. That’s about a factor of seventeen, there.


Judging by performance on the other project devs’ machines the Beast is good for a 2x to 3x speedup over a conventionally-balanced PC design (that is, one with worse RAM latency, narrower caches, more cores but somewhat lower single-thread speed). That’s a big enough advantage to validate the design and be practically significant on large repositories.



The rest of the speedup is software. I did a lot of work on that two or three weeks back, but more recently Laurence Hygate has gotten the bit in his teeth and delivered some truly amazing improvements. At this point I’d have to say he has probably delivered a bigger cumulative performance delta than I have.


I, meanwhile, have been trying to concentrate on correctness issues. The present code does an excellent job in most cases – and I can now prove that that, having built a script wrapper that systematically compares CVS checkouts of tags and branches with their equivalents in a git conversion. But there are three remaining trouble spots.


One is CVS vendor branches. I think I know how the present code goes wrong handling these, but it’s not an easy fix. The symptom is content mismatches between tagged states in CVS and a git conversion.


Another is coping with CVS’s all-too-frequent failures near file deletions. I’m not even certain I completely understand all the problems here yet. This also manifests as content mismatches at tagged states – usually files persisting in the git conversion after the equivalent point at which they should have been deleted.


Finally, we’ve seen some repos that produce a fatal internal error complaining about a “branch cycle”. I have almost no understanding of what’s generating this.


I want to solve at least the vendor branch problem before I go hunting big game. I have some small test repositories that replicate it, so that should be doable.


The other big issue is target identification. This is where my blog followers and others who want to stamp out CVS in our lifetime can help. Find us projects still using CVS – best targets are those you’ve sounded out for interest in converting and gotten some positive response from.

 •  0 comments  •  flag
Share on Twitter
Published on December 13, 2014 15:04
No comments have been added yet.


Eric S. Raymond's Blog

Eric S. Raymond
Eric S. Raymond isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.
Follow Eric S. Raymond's blog with rss.