Eric S. Raymond's Blog, page 55
December 14, 2012
Refuting “The Mathematical Hacker”
Evan Miller’s essay The Mathematical Hacker is earnest, well-intentioned, and deeply wrong. Its errors begin with a serious misrepresentation of my views and my work, and fan out from there.
Miller chracterizes my views as “Mathematics is unnecessary [in programming] except in specialized fields such as 3D graphics or scientific computing.” But this is not even a good paraphrase of the quote he cites, which says hackers “won’t usually need trigonometry, calculus or analysis (there are exceptions to this in a handful of specific application areas like 3-D computer graphics).”
What I am actually asserting in that quote is that continuous, as opposed to discrete, mathematics, is not generally useful to programmers. I sharpen the point by continuing thus: “Knowing some formal logic and Boolean algebra is good. Some grounding in finite mathematics (including finite-set theory, combinatorics, and graph theory) can be helpful.”
This (which Miller blithely ignores) is very far from asserting that hackers can or should be indifferent to mathematics. Rather, it is me, as an erstwhile mathematician myself, attempting to point aspiring hackers at those domains of mathematics that are most likely to be useful to them. Much of theoretical computer science builds on these; automata theory and algorithmic complexity theory are among the more obvious examples.
Having gone wrong right at the start, Miller swiftly compounds his error: “If you are a systems and network programmer like Raymond, you can do your job just fine without anything more than multiplication and an occasional modulus.” This is somewhat too narrow about the scope of what I do, and even if it were accurate Miller would be arguing against his own case by taking far too narrow a view of what mathematics a “systems and network programmer” can use.
In fact, it is routine for systems programmers to have to grapple with problems in which graph theory, set theory, combinatorial enumeration, and statistics could be potent tools if the programmer knew them. If Miller knows this, he is being rhetorically dishonest; if he does not know it, he is far too ignorant about what systems programmers do to be making any claims about what they ought to know.
Thus, when Miller asserts that I would agree with the claim “from a workaday perspective, math is essentially useless”, he is ludicrously wrong. What he has done is conflate “math” with a particular kind of mathematics centered in calculus and continuous analysis – and as a (former) mathematician myself, I say this is a form of nonsense up with which I do not intend to put.
Miller perpetrates all these errors in his five opening paragraphs. Sadly, it only gets more dimwitted from there. Consider this gem: “One gets the impression reading Raymond, Graham, and Yegge (all self-styled Lisp hackers) that the ultimate goal of programming is to make a program that is more powerful than whatever program preceded it, usually by adding layers of abstraction.”
This is superficially profound-sounding, but nonsense. I admit to not being familiar with Steve Yegge’s work, but Paul Graham is hardly lost in layers of abstraction; he used his Lisp-programming chops to build a company that he sold for $50 million. One of my best-known projects is GPSD, which gets down in the mud required to do data analysis from navigational sensors and is deployed on millions of embedded platforms; another is GIFLIB, which has been throwing pixels on all the world’s display devices since 1989. Neither of us inhabits any la-la-land of pure computational aesthetics; this is Miller misreading us, wilfully ignoring what we actually code and ship.
From here, Miller wanders off into knocking over a succession of straw men, contrasting a “Lisp culture” that exists only in his imagination with a “Fortran culture” that I conjecture is equally fantastical. Not absolutely everything he says is nonsense, but what is true is not original and what is original is not true.
Miller finishes by saying “Lastly, we need the next generation of aspiring hackers to incorporate mathematics into their program of self-study. We need college students to take classes in physics, engineering, linear algebra, statistics, calculus, and numerical computing…”
This is true, but not for the reason Miller wants us to believe it is true. The bee in his bonnet is that continuous mathematics is generally useful to programmers, but that claim remains largely false and has nothing to do with the real utility of most of these fields. Their real utility is that they require their practioners to think and to engage with the way reality actually works in ways that softer majors outside science/technology/engineering/mathematics seldom do.
That kind of engagement we could certainly use more of. Arguments as bad as Evan Miller’s are unlikely to get us there.
December 11, 2012
Heavy weapons
I should have known it. I really should have expected what happened.
So it’s about week 7 at kuntao training, the tasty exotic mix of south Chinese kung fu and Philippine weapons techniques my wife and I are now studying, and we’re doing fine. The drills are challenging, but we’re up to the challenge. Our one episode of sparring so far went very well, with Cathy and I both defeating our opponents decisively in knife duels. The other students and the instructors have accepted us and show a gratifying degree of confidence in our abilities. Our first test approaches and we are confident of passing.
The only fly in the ointment, the one silly damn thing I’ve had persistent trouble with, is spinning my escrima sticks. Last night I found out why…
Fast wrist spins are a move in several of the drills. The combat application, of course, is that a wrist spin is a way to put kinetic energy into the stick when you don’t have room or time to swing it. Intermediate-level students are expected to be able to do them casually and so fast that the stick is hard to see moving. Cathy, of course, picked this up like it was nothing – which means everybody in our normal classes can do it except me.
I’ve been working at this diligently, but making only slow progress. This is exceptionally difficult for me, and it’s not the first time I’ve had problems this way, either; there’s a similar technique in Western sword called an “inside return”, which you’re supposed to use to dissipate the reaction energy from a sword strike so it doesn’t injure your tendons, and I’ve never been very good at that one either.
A significant part of the problem is just that I have thick, muscular wrists – they looked like swordsman’s wrists before I was a swordsman. Usually this is a good thing, but the extra power comes bundled with minor range-of-motion issues as an unwelcome extra. Spinning escrima sticks is one of the few contexts where this actually matters.
Last night we’re doing prep for the upcoming test, and I’m working with the pair of escrima sticks they issued me when I joined the school. They’re relatively lightweight rattan, a bit less than an inch in diameter, 26in long. I’m having my usual troubles – I can strike with them, but I have the devil’s own time controlling them when I try to spin them. The move just doesn’t feel right, and it doesn’t work right.
We get to a point in the test prep where the instructor tells us we’re going to need a third stick (not for wielding, to lay on the floor as a marker). I hustle over to the pile of fighting sticks under the target dummy and grab one. It feels…different.
I look at it. It’s thick – easily a half-inch more outside diameter than mine. It’s longer, too, and finished in some kind of glossy lacquer. And it’s heavy, at least half again and maybe twice the weight I’m used to. I heft it experimentally, and think “Hey. This feels pretty good. I wonder if…”
It spins beautifully. All this time, what I’ve needed to make the move work was a heavier weapon.
And I could kick myself. The clues were all there, if I’d put them together. I remember what a miserable time I had trying to control training-weight nanchaku, then what it was like when I picked up a fighting-weight nanchaku and it was easier. I remember that I’ve changed swords twice, each time for a heavier and longer weapon, and my sword-handling improved both times.
Might be the larger diameter is part of it, too. I have to slightly clench my hands to hold the smaller sticks; my grip on the bigger ones is less tense, which makes it easier to move them fluidly.
In a state of happy excitement I went to sifu Yeager and said “Look at this!” and spun the stick. “Can I get two of these? They’re just what I need!”.
He contemplates the thing dubiously and informs me that it’s a master’s stick, far too heavy for a newb student to use in training. Bummer. I see his point – it wouldn’t take a lot of effort to crack someone’s skull with what I’m holding. Well, not a lot of effort for me, anyway, and that’s the problem – he reckons I could easily over-power the thing against a training partner in a moment of inattention. In fact that is highly unlikely – my force control is extremely precise and reliable – but he hasn’t been training me long enough yet that I can reasonably expect him to know that.
But he sees my point, too. The standard training sticks are clearly just too light and skinny for me to handle well. It’s not my technique at fault after all, something about the physics and physiology has been messing me over. He promises me a pair that’s thicker and longer, at least, and then grins and makes a John Holmes joke.
Now I’m looking forward to my new sticks. And wondering how common a problem this is.
December 2, 2012
Beware! The Reposturgeon!
I had said I wasn’t going to do it, but…I experimented, and it turned out to be easier than I thought. Release 2.7 of reposurgeon writes (as well as reading) Subversion repositories. With the untested support for darcs, which should work exactly as well as darcs fast-export and fast-import do, this now brings the set of fully-supported version-control systems to git, hg, bzr, svn, and darcs; reposurgeon can be used for repository surgery and interconversion on any of these.
There are some significant limitations in the write-side Subversion support. For various ugly reasons having to do with the mismatch between Subversion’s ontology and that of git import streams, Subversion repositories won’t usually round-trip exactly through reposurgeon. File content histories will remain the same, but the timing of directory creations and deletions may change. The pathological things known in the Subversion world as “mixed-branch commits” are split apart at Subversion-read time and not reassembled when and if the repo state is written back out in Subversion form. Custom Subversion property settings (basically, everything but svn:ignore, svn:executable, and svn:mergeinfo) are lost on the way through. There are other problems of a similar nature, all documented in the manual.
A particularly unfortunate problem is that mergeinfo properties may be simplified or lost. Mapping between gitspace and Subversion merges is messy because a Subversion merge is more like what gitonauts call a “cherry-pick” than a git-space merge – I don’t have a general algorithm for this (it’s a research-level problem!) and don’t try to handle more than the most obvious branch-merge cases.
It could fairly be alleged that the capability to write Subversion repositories is more a cute stunt than anything that’s likely to be useful in a production situation. While I have regression tests for it that show it works on branching and merging commit graphs, I don’t think I’d actually want to trust it, yet, on a repository that wasn’t linear or only simply branching. Arcane combinations of branching, merging, and tagging could reveal subtle bugs without surprising me even slightly.
Still…having it work even as conditionally as it does seems something of an achievement. Not one I was expecting, either. I really only did it because someone on the Subversion dev list asked about write support, I wanted to reply by listing all the reasons it wouldn’t work – and then I found that I couldn’t actually make that list without trying to implement the feature. It was ever thus…
The only unconquered frontier of any significance in open-source VCSes is CVS, really. No way I’ll do write-side support for that (and I mean it this time!) but I’ve sent the maintainer of cvsps a proof-of-concept patch that almost completely implements a fast-export stream dump for CVS repositories. We’ll see where that goes.
November 29, 2012
Don’t overinterpret the Sorrell Doctrine!
I have submitted an essay to the Stanford Law Review for publication. I didn’t tick the box for “exclusive”, so I think I can blog it as well. It’s a reply to Andrew Tutt’s essay on Software Speech.
Andrew Tutt’s essay “Software Speech” rightly points out that the Sorrell and Brown cases set up inconsistent standards for whether software is to be considered “speech” and entitled to First Amendment protection. The logic of his essay goes astray, however, when he projects the consequences of Sorrell; they need not be so sweeping as he might suppose, and in fact a software engineer (someone working in the field) would not expect them to be.
I am a software engineer, not a constitutional attorney. But, as the founding president of the Open Source Initiative (the generally recognized certification authority on what licensing terms can be considered “open source”) I have been frequently required to grapple with questions about law, policy, feedom of speech, and intellectual property. I was also an individual amicus in Reno v. American Civil Liberties Union, the successful 1996 case against the Communications Decency Act. The territory of Mr. Tutt’s essay is not strange to me.
There is a colorable distinction, one obvious to any software engineer, between software considered as an act of speech versus software considered as an instrument of speech. Those of us (including myself) who hold that “software is speech” are insisting that the creation of software is a form of creative and expressive speech act with a result entitled to all the protection against coercive interference that we would extend, say, to a copy of Lady Chatterley’s Lover or Mein Kampf.
The consequences we would draw from that claim are significant. At present it is technically illegal to publish or convey software that constitutes a “circumvention device” under the Digital Millennium Copyright Act of 1997. As a matter of principled civil disobedience in support of First Amendment liberty, I carry a link to such software on the front page of my website even though I have never used it. If Sorrell were applied consistently, that provision of the DMCA would be annulled.
But Mr. Tutt is wildly off the mark in supposing that Sorrell would preclude any regulation of speech in which software functions as instrument rather than itself being the speech act. He worries that “Apple’s wish to exclude disfavored books from the iPad eBook reader, or banish Adobe Flash from its iPhone browser, would simply be Apple’s speech.” There is a case for that position, but it has nothing to do with Sorrell; Sorrell would only protect Apple’s right to publish the eBook and browser software.
I can perhaps make this distinction clearer with a roughly parallel case. The right to publish instructions for building a pipe bomb is constitutionally protected as an expressive act; this does not mean we relinquish any regulation of actually detonating such devices!
Declaring my interest, I’m concerned to rebut Mr. Tutt’s overinterpretation of Sorrell not merely because I think it is fallacious but because such overinterpretation might cause a damaging reaction against it. All expressive speech, in whatever medium, deserves Constitutional protection; Sorrell merely affirms that software is not an exception.
Mr. Tutt’s confusion is understandable, and that the Supreme Court shares this confusion is suggested by the implicit conflict he points out between the Sorrell and Brown decisions. It can be difficult to reason crisply when act and instrument are both intangible and are closely entangled. But software engineers have to do this all the time. The legal academy might benefit, on this and related issues, if it listened a bit more to the engineers and a bit less exclusively to itself.
UPDATE” Don Marti points at an excellent analysis along similar lines, Publishing Software as a Speech Act.
November 27, 2012
English is a Scandinavian language?
Here’s the most interesting adventure in linguistics I’ve run across in a while. Two professors in Norway assert that English is a Scandinavian language, a North Germanic rather than a West Germanic one. More specifically, they claim that Anglo-Saxon (“Old English”) is not the direct ancestor of modern English; rather, our language is more closely related to the dialect of Old Norse spoken in the Danelaw (the Viking-occupied part of England) after about 865.
The bolster their claim by pointing at major grammatical traits which English shares with Old Norse rather than West Germanic languages – notably, consistent SVO (subject-verb-object) word order rather than the SOV (subject-object-verb) or V2 (verb-second) orders that dominate in languages like German, Dutch and Anglo-Saxon. The practical consequence they point out (correctly – I’ve experienced this myself) is that English and Norwegian or Swedish are quite a bit closer in mutual intelligibility than any of this group is with German or Dutch or Anglo-Saxon. I had actually noticed this before and been puzzled by it.
The professors think the reason for this is that rather than evolving into Modern English, Anglo-Saxon actually died out during the two centuries between the invasion of the Great Army in 865 and the defeat of Harold Godwinsson in 1066. They propose that Anglo-Saxon influenced, but was largely replaced by, the Norse dialect of the Anglo-Danish Empire. Which, SVO North Germanic grammar and all, then collided with Norman French and evolved into English as we know it.
This isn’t crazy. It may be wrong, but it isn’t crazy. Two centuries is plenty of time for an invading language to reduce a native one to a low-status argot and even banish it entirely; we’ve seen it happen much faster than that when the invaders are as culturally and politically dominant as the Anglo-Danes were in England at the time of Cnut (1016-1035).
Even in the conventional account of the evolution of English, modern English is supposed to have derived from the Anglo-Saxon spoken in the East Midlands – which, as the professors point out, was the most densely settled part of the Danelaw!
All of this gave me an idea that may go beyond the professors’ hypothesis and explain a few other things…
Previously on this blog my commenters and I have kicked around the idea that English is best understood as the result of a double creolization process – that it evolved from a contact pidgin formed between Anglo-Saxon and Danelaw Norse. The creole from that contact then collided, a century later, with Norman French. Wham, bam, a second contact pidgin forms; English is the creole descended from the language of (as the SF writer H. Beam Piper famously put it) “Norman soldiers attempting to pick up Anglo-Saxon barmaids”.
This is not so different from the professors’ account, actually. They win if the first creole, the barmaids’ milk language, was SVO with largely Norse grammar and some Anglo-Saxon vocabulary. The conventional history of English would have the girls speaking an SOV/V2 language with largely Anglo-Saxon grammar and some Norse vocabulary.
So I’m thinking about this, and about the political-cultural situation in East Anglia at the time historical linguists suppose it to have been the cradle of modern English, and I thought…hey! Diglossia! Basilect and acrolect!
OK, for those of you not up on your linguistic jargon, these are terms used in modern linguistics to describe the behavior of speakers in a creole continuum. Often, in a contact culture where an invading language has partly or wholly displaced a native one, you get a continuum of dialects between the acrolect (“high” language, of the invaders) and basilectal (“low” dialects) preserving more of a native language which may or may not still be alive in its original form.
A type case for this is modern Jamaica, where there’s a dialect continuum between acrolectal standard English and basilectal Jamaican patois with a lot of survivals from West African languages and Arawak. Outsiders tend to oversimplify this kind of situation into diglossia – one population speaking two languages, one “outside” and prestigious, one “inside”, intimate and tied to home and ethno-cultural identity.
But it isn’t that simple in Jamaica. Individuals are often fluent in both acrolectal and basilectal forms and mix usages depending on social situation. Husband and wife might speak acrolectal English on business, a mesolectal light patois among a mixed-race group of friends, but a deep patois with a grammar significantly different than standard English when cooking or making love. (I have a teenage nephew who lives on St. John’s, another Caribbean island, who – though tow-headed and blue-eyed and perfectly capable in American English – sometimes busts out a deep-black island dialect at family gatherings. It’s mischievous and barely intelligible, but it’s affectionate, too.)
I think, now (and this is where I go beyond those professors in Norway) that East Anglia between the invasions of the Great Army and Willam the Bastard must have been a lot like Jamaica today. Nothing quite as neat as one language dying out, but rather a creole continuum – with Danelaw Norse at the top, a remnant Anglo-Saxon at the bottom, and a whole lotta code-switching going on. There’s your cradle of English! (Well, before the Normans added their special sauce, anyway…)
This would explain much that the conventional Anglo-Saxon-centric account doesn’t, like why I can read a Norwegian newspaper far more readily than a German or Dutch one. It’s more nuanced than the professors’ version, but leads to the same top-line conclusion. English better classified as a Scandinavian rather than a West Germanic language? OK, twice creolized and later heavily infiltrated by Latin and French…but yeah, I’ll buy that description.
November 19, 2012
The wages of secrecy
One of my regulars, contemplating the increasingly pathetic series of clusterfucks that have passed for exciting new products at Microsoft, wonders why a company with all its advantages – more money than $DEITY to hire the best developers, lots of experience, dominant position in a major technology market – can’t seem to release a decent product any more.
The answer is simple and deep. It’s because evil is inefficient.
Ethical behavior and sustainability are connected in both directions; the wages of sin are self-damage.
When you pursue a business model based on secrecy rent and control of your customers, you must become the kind of organization that an obsession with secrecy and control requires. Eventually, this will smother your ability to do decent engineering as surely as water flows downhill and the sun rises in the morning.
This is why Microsoft looks so doomed and desperate. Yes, Steve Ballmer is a colossal fool who has never met a strategic decision he couldn’t bungle, but in an important way that is symptom rather than cause. Dysfunctional leaders arise from dysfunctional cultures; the problem behind Ballmer is that Microsoft’s culture is broken, and the problem behind that is that the monopolistic/authoritarian goals around which Microsoft’s culture was constructed are incompatible with any other kind of excellence.
A more poetic way to put this is Tolkien’s “Oft evil will shall evil mar.” Google’s “Don’t be evil” isn’t mere idealism or posturing, it’s an attempt to sustain the kind of culture in which excellence is possible. (Whether and how long this will be a successful attempt is a different question.)
Apple’s turn is next.
November 17, 2012
A secret of game-fu
Last night I utterly trounced three opponents at the slick new Fantasy Flight reissue of a classic interstellar trade and exploration game, Merchants of Venus. My end score was nearly three times that of the runner-up, and I had acquired so many fame points (which each become 10 victory points at game end) that we ran out of fame tokens.
One of the other players half-humorously protested that I had gotten incredibly lucky. “Nonsense”, I said, “it was planning”. He sputtered that I had frequently had the victory conditions for lucrative missions apparently drop in my lap. Which was true, and he was right to view those individual occurrences as luck. But it was also true that I planned my way to victory.
I made chance work for me. Pay attention, because I am about to reveal why there is a large class of games (notably pick-up-and-carry games like Empire Builder, network-building games like Power Grid, and more generally games with a large variety of paths to the win condition) at which I am extremely difficult to beat. The technique is replicable.
I have a rule: when in doubt, play to maximize the breadth of your option tree. Actually, you should often choose option-maximizing moves over moves with a slightly higher immediate payoff, especially early in the game and most especially if the effect of investing in options is cumulative.
This rule has many consequences. In pick-up-and-carry games, it means that given any choice in the matter you want to start by deploying or moving your train or spaceship or whatever to the center of the board. You minimize your expected distance over the set of all possible randomly-chosen destinations that way. You give yourself the best possible chance to “get lucky” by finding a fattest possible contract or trade opportunity that you can deliver in minimum time.
More generally, in games with multiple paths to victory, open as many of those paths as you can. And heavily favor moves that help you explore the possibilities faster than your opponents. In Empire Builder, buy the faster train as soon as possible. In Merchants of Venus, the first ship upgrade I bought was better engines.
In games with an exploration mechanic, like Merchants of Venus or Eclipse, push it hard in the early game. Again, the payoff here is that you’re generating options for yourself. This effect is particularly strong in Merchants of Venus because on a first-contact planetfall you get to do two buys and sells with the natives rather than the normal one – you have that much better a chance of a trade good you previously bought on spec being highly valuable, or of picking up a spec load that will pay off large at your next first contact. (Of course, when this happens, it looks like luck.)
Look for other ways to broaden your option tree. In the Merchants of Venus game one of my other early purchases was a second mission-card slot. From early in the game to shortly before the end, this meant I had a choice of two missions to work on rather than just the one other players were pursuing. So of course I fulfilled them more often! It looked like I was getting lucky; what I was actually doing was maximizing the number of possible ways I could get lucky.
In network-building games like Power Grid and Empire Builder, bias towards moves that make your network closer to a minimal spanning tree for all destinations of interest – that is, accept somewat lower immediate payoffs and/or higher costs for building such links. This maximizes your chances of being able to reach anywhere quickly in the later game.
Power Grid is an instructive example of a game with positional, network-building strategy in which maximizing your option tree can also be done in some ways that aren’t at all positional. One relatively obvious one is to buy hybrid plants, which increase your options for both price-taking in the fuel market and (less obviously) manipulating it.
Another one is to be willing to pay what you have to to get a game-ender plant (a 5 or 6) within the first few rounds, even if it means you don’t get to build cities in that turn and your revenue doesn’t go up. The real payoff here is being able to sit out several auction rounds while other players are scrambling for plant capacity to match their city-building. Their options are narrow in each round; yours aren’t – you can pile up money or opportunistically grab only the most efficient plant buys as they go by.
I rely particularly heavily on the latter tactic. I made the national Power Grid finals with it this year.
If you are in a game where other opponents can directly mess with you, maximizing your option tree also makes it more difficult for them to correctly predict which countermoves will damage you the most. And even if they close off one tactical path, you’ll have others. More generally, you may overwhelm their capacity to model your behavior, so the game looks to them like constant surprises with you coming at them from very direction at once. Weak players often fail a morale check in this situation and become even weaker.
(This happened last night – one total morale collapse and one partial out of three opponents. Unsurprisingly to me, the third guy, the one with the most sitzfleisch, came in second.)
Afterwards, they think you “got lucky”. This is an illusion they foist on themselves through picking a single path to victory and working it as hard as possible. Because this makes their range of usable lucky breaks smaller and less likely to occur, they overestimate the element of chance in your victory – they judge it by how lucky they would have had to be to win by a similar margin.
And why am I OK with telling you this secret? Because ha ha, Grasshopper, I have other secrets. Perhaps I will share some of them in future posts.
November 4, 2012
reposurgeon 2.0 announcement – the full-orchestra version
I shipped reposurgeon 2.0 a few days ago with the Subversion support feature-complete, and a 2.1 minor bugfix release this morning. My previous release announcement was somewhat rushed, so here is a more detailed one explaining why anybody contemplating moving up from Subversion should care.
To go with this, there is a new version of my DVCS Migration HOWTO.
reposurgeon can now read and analyze Subversion stream dumps, and can translate them to git fast-import streams. This brings with it the ability to export not just to git but to any DVCS that can speak that stream format; reposurgeon currently has direct support for hg and bzr.
Branchy repos are automatically handled correctly, with trunk mapped to master and Subversion branches maped to gitspace branches.
Subversion tags are automatically mapped to git annotated tags (or to branches with tagged ends if the tag directory was modified after the copy).
Multibranch commits are automatically split into annotated per-branch commits.
Various kinds of meaningless cruft and artifacts generated by older versions of cvs2svn are automatically cleaned up. (But no potentially interesting comments or metadata are ever thrown away.)
Ersatz branch copies consisting of a plain directory copy followed by multiple adds are detected and treated like intentional branch creations.
svn:ignore property settings and clears are automatically translated to equivalent creations and removals of .gitignore files.
There is semi-automated support for lifting CVS and Subversion commit references in change comments to a VCS-independent date!committer format.
svn:special properties are translated to git symlink references.
It is never necessary to hint to reposurgeon or give it branch-rewrite rules to get a clean lift. In some very unusual theoretical cases, post-lift surgery to sort out branches might be required, but no example of this has yet been observed in the wild.
What reposurgeon does is carefully and exhaustively documented, even in the strange edge cases.
Most of the pre-existing conversion tools don’t do any of these things properly. reposurgeon does them all, with an extensive regression-test suite to demonstrate correctness. The code has also been field-tested on several large Subversion repositories (notably for the gpsd, Hercules, NUT, and Roundup projects) with good results.
I believe reposurgeon now does almost as good a job of lifting as is possible given the ontological differences between Subversion and git. I say “almost” only because there is still some room for improvement in recognizing Subversion branch-merges-by-copy and translating them as gitspace DAG merges.
Note one important restriction: reposurgeon can read Subversion dumps, but cannot write them – the downconversion from fast-import streams would be too lossy to be safe.
I started working on the Subversion-stream support about a year ago. What took so long was getting the multibranch support to automatically do the right thing in various semi-pathological merge cases.
November 2, 2012
Terror of the Reposturgeon!
I’ve just shipped reposurgeon 2.0, a power tool for editing and interconverting version-control repositories. This is a major release, adding the capability to read Subversion dump files directly.
I’ve blogged about this project before, highlights at 1 2 3 4 5 6 7 8.
Development had gotten stalled for six months because of a really insidious bug in the handling of Subversion dumps with certain odd sorts of multibranch histories. But recently one Greg Hudson sent me a performance-enhancement patch that enabled me to get rid of an O(n**2) lookup function deep in the code – and, as I had suspected since it first popped up, the bug was in that lookup function. (No, I still don’t know exactly where – it was something subtle.)
I’m shipping this now, before my previous target of when the NUT project guys sign off on their repo conversion, because I want people to stop using the older versions. 99 out of 100 times they would be OK, but that 100th time could be nasty.
Night of the Living Reposturgeon!
I’ve just shipped reposurgeon 2.0, a power tool for editing and interconverting version-control repositories. This is a major release, adding the capability to read Subversion dump files directly.
I’ve blogged about this project before, highlights at 1 2 3 4 5 6 7 8.
Development had gotten stalled for six months because of a really insidious bug in the handling of Subversion dumps with certain odd sorts of multibranch histories. But recently one Greg Hudson sent me a performance-enhancement patch that enabled me to get rid of an O(n**2) lookup function deep in the code – and, as I had suspected since it first popped up, the bug was in that lookup function. (No, I still don’t know exactly where – it was something subtle.)
I’m shipping this now, before my previous target of when the NUT project guys sign off on their repo conversion, because I want people to stop using the older versions. 99 out of 100 times they would be OK, but that 100th time could be nasty.
Eric S. Raymond's Blog
- Eric S. Raymond's profile
- 140 followers
