Eric S. Raymond's Blog, page 28

git-weave, a tool for synthesizing repositories from fossil tarballs

Welcome to my first new-project release of the year, git-weave. It’s a polished and documented version of the script I used to reconstruct the early history of INTERCAL five years ago – see Risk, Verification, and the INTERCAL Reconstruction Massacree for the details on that one.

git-weave can be used to explode a git repository into a sequence of per-commit directory trees accompanied by a metadata file describing parent-child linkage, holding committer/author/timestamps/comment metadata, and carrying tags.

Going in the other direction, it can take the same sequence of trees plus metadata file and reconstruct the live repository. Round-tripping is lossless.

What it’s really useful for is reconstructing a partial but useful ancient history of a project from before it was put under version control. Find its release archives, synthesize a metadata file, apply this tool, and you get a repository that can easily be glued to the modern, more continuous history.

Yes, you only get a commit for each release tree or patch you can dig up, but this is better than nothing and often quite interesting.

Nifty detail: the project logo is the ancient Egyptian hieroglyph for a weaver’s shuttle.

View more on Eric S. Raymond's website »

Like • 0 comments • flag

Published on July 03, 2015 22:10

How to spot a high-quality repository conversion

In my last post, I inveighed against using git-svn to do whole-repository conversions from Subversion to git (as opposed to its intended use, which is working a Subversion repository live through a git remote).

Now comes the word that hundreds of projects a week seem to be fleeing SourceForge because of their evil we’ll-hijack-your-repo-and-crapwarify-your installer policy. And they’re moving to GitHub via its automatic importer. Which, sigh, uses git-svn.

I wouldn’t trust that automatic importer (or any other conversion path that uses git-svn) with anything I write, so I don’t know how badly it messes things up.

But as a public service, I follow with a description of how a really well-done repository conversion – the kind I would deliver using reposurgeon – differs from a crappy one.

In evaluating quality, we need to keep in mind why people spelunk into code histories. Typically they’re doing it to trace bugs, understand the history of a feature, or grasp the thinking behind prior design decisions.

These kinds of analyses are hard work demanding attention and

cognitive exertion. The last thing anyone doing them needs is to have his or her attention jerked to the fact that back past a certain point of conversion everything was different – commit references in alien and unusable formats, comments in a different style, user IDs missing, ignore patterns not working, etc.

Thus, as a repository translator my goal is for the experience of diving into the past to be as frictionless as possible. Ideally, the converted repository should look as though modern DVCS-like practices had been in use from the beginning of time.

Some of the kinds of glitches I’m going to describe may seem like they ought to be ignorable nits. And individually they often are. But the cumulative effect of all of them is distracting. Unnecessarily distracting.

These some key things that distinguish a really good conversion, one that’s near-frictionless to use, from a poor one.

1. Subversion/CVS/BitKeeper user IDs are properly mapped to Git-style human-name-plus-email identifications.

Sometimes this is a lot of work – for one conversion I did recently I spent many hours Googling to identify hundred of contributors going back to 1999.

The immediate reason this is valuable is so we know who was

responsible for individual commits, which can be important in bug forensics.

A more social reason is that otherwise OpenHub and sites like it in the future won’t be able to do reputation tracking

properly. Contributors deserve their kudos and should have it.

2. Commit references are mapped to some reasonably VCS-independent way to identify the commits they point at; I generally use ether unique prefixes of commit comments or commiter/date pairs.

Because ‘r1234′ is useless when you’re not in Subversion-land anymore, Toto. And appending a fossil Subversion ID to every commit comment is heavyweight, ugly, and distracting.

3. Comments are reformatted to be in DVCS form – that is, standalone summary line plus (if there’s more) a spacer line plus following paragraphs.

Yes, this means that to do it right you need to eyeball the entire comment history end edit it into what it would have looked like if the committers had been using those conventions from the beginning. Yes, this is a lot of work. Yes, I do it, and so should you.

The reason this step is really important is that without it it tools like gitk and git log can’t do their job properly. This makes it far more difficult for people reading the history to zero in efficiently on what they need to know to get real work done,

4. Ignore patterns and files should be lifted from the syntax and wildcarding conventions of the old system to the syntax and wildcarding conventions of the new one.

This is one of the many things git-svn simply fluffs. Other batch-mode converters could in theory do a better job, but generally don’t.

5. The converted repository should not lose valuable metadata – like release tags.

Yes, I’m actually looking at a GitGub conversion that was that bad.

When the tags are missing, users will be unable to identify

or do code diffs against historical release points. It’s a usability crash landing.

View more on Eric S. Raymond's website »

Like • 0 comments • flag

Published on June 23, 2015 05:20

Don’t do svn-to-git repository conversions with git-svn!

This is a public-service warning.

It has come to my attention that some help pages on the web are still recommending git-svn as a conversion tool for migrating Subversion repositories to git. DO NOT DO THIS. You may damage your history badly if you do.

Reminder: I am speaking as an expert, having done numerous large and messy repository conversions. I’ve probably done more Subversion-to-git lifts than anybody else, I’ve torture-tested all the major tools for this job, and I know their failure modes intimately. Rather more intimately than I want to…

There is a job git-svn is reasonably good at: live gatewaying to a Subversion repo, allowing you to pretend it’s actually in git. Even in that case it has some mysterious bugs, but the problems from these can usually be corrected after the fact.

The problem with git-svn as a full importer is that it is not robust in the presence of repository malformations and edge cases – and these are all too common, both as a result of operator errors and scar tissue left by previous conversions from CVS. If anyone on your project has ever done a plain cp rather than “svn cp” when creating a tag directory, or deleted a branch or tag and then recreated it, or otherwise offended against the gods of the Subversion data model, git-svn will cheerfully, silently seize on that flaw and amplify the hell out of it in your git translation.

The result is likely to be a repository that looks just right enough at the head end to hide damage further back in the history. People often fail to notice this because they don’t actually spend much time looking at old revisions after a repository conversion – but on the rare occasions when history damage bites you it’s going to bite hard.

Don’t get screwed. Use git-svn for live gatewaying if you must, remaining aware that it is not the safest tool in the world. But for a full conversion use a dedicated importing tool. There are around a half-dozen of these; beware the ones that are wrappers around git-svn, because while they may add a few features they can’t do that much to address its weaknesses.

Ideally you want an importer with good documentation, a comprehensive end-to-end test suite full of examples of common Subversion-repository malformations, a practice of warning you when it trips over weirdness, and a long track record of success on large, old, and nasty repositories. And if you learn of any tool with all those features other than reposurgeon please let me know about it.

Please share this and link to it so the warning gets as widely distributed as possible.

P.S.: It has been pionted out that I couls provide more positive guidance. Here it is: the DVCS Migration HOWTO.

View more on Eric S. Raymond's website »

Like • 0 comments • flag

Published on May 28, 2015 03:27

Zeno tarpits

There’s a deeply annoying class of phenomena which, if you write code for any length of time, you will inevitably encounter. I have found it to be particularly prevalent in transformations to clean up or canonicalize large, complex data sets; repository export tools hit variants of it all the time, and so does my doclifter program for lifting [nt]roff markup to XML-DocBook.

It goes like this. You write code that handles a large fraction (say, 80%) of the problem space in a week. Then you notice that it’s barfing on the 20% remaining edge cases. These will be ugly to handle and greatly increase the complexity of your program, but it can be done, and you do it.

Once again, you have solved 80% of the remaining cases, and it took about a week – because your code is more complex than it used to be; testing it and making sure you don’t have regressions is about twice as difficult. But it can be done, at the cost of doubling your code complexity again, and you do it. Congratulations! You now handle 80% of the remaining cases. Then you notice that it’s barfing on 20% of remaining tricky edge cases….

…lather, rinse, repeat. If the problem space is seriously gnarly you can find yourself in a seemingly neverending cycle in which you’re expending multiplicatively more effort on each greater effort for multiplicatively decreasing returns. This is especially likely if your test range is expanding to include weirder data sets – in my case, older and gnarlier repositories or newer and gnarlier manual pages.

I think this is a common enough hazard of programming to deserve a name.

If this narrative sounds a bit familiar, you may be thinking of the paradox of motion usually attributed to the philosopher Zeno of Elea. From the Internet Encyclopedia of Ohilosophy:

In his Achilles Paradox, Achilles races to catch a slower runner–for example, a tortoise that is crawling away from him. The tortoise has a head start, so if Achilles hopes to overtake it, he must run at least to the place where the tortoise presently is, but by the time he arrives there, it will have crawled to a new place, so then Achilles must run to this new place, but the tortoise meanwhile will have crawled on, and so forth. Achilles will never catch the tortoise, says Zeno. Therefore, good reasoning shows that fast runners never can catch slow ones.

In honor of Zeno of Elea, and with some reference to the concept of a Turing tarpit, I propose that we label this programming hazard a “Zeno tarpit”.

Once you know this a thing you can be watching for it and perhaps avoid overinvesting in improvement cycles that pile up code complexity you will regret later. Also – if somebody asks you why your project has run so long over its expected ship date, “It turned into a Zeno tarpit” is often both true and extremely expressive.

View more on Eric S. Raymond's website »

Like • 0 comments • flag

Published on May 18, 2015 19:19

How to Deny a Question’s Premise in One Easy Invention

Now that the Universe Splitter is out, it might be that a lot more people are going to trip over the word “mu” and wonder about it. Or it might be the word only occurs in the G+ poll about Universe Splitter – I don’t know, I haven’t seen the app (which appears to be a pretty good joke about the many-wolds interpretation of quantum mechanics) itself.

In any case, the most important thing to know about “mu” is that it is usually the correct answer to the question “Have you stopped beating your wife?”. More generally, it is a way of saying “Neither a yes or no would be a correct answer, because your question is incorrect”,

But the history of how it got that meaning is also entertaining.

The word “mu” is originally Chinese, and is one of the ways of saying a simple “no” or “nothing” in that language. It got its special meaning in English because was borrowed by Japanese and appears in translations of a Zen koan titled “Joshu’s Dog” from the collection called Gateless Gate. To some (but not all) interpreters in the Zen school, the word “mu” in that koan is interpreted in a sense of denying the question.

Wikipedia will tell you this much, tracing the special question-denying sense of “mu” in English through Robert Pirsig’s Zen and the Art of Motorcycle Maintenance (1974) and Douglas Hofstadter’s Gödel, Escher, Bach (1979).

However, Wikipedia’s account is incomplete in two respects. First, it doesn’t report something I learned from the Japanese translator of The Cathedral and the Bazaar, which is that even educated speakers of modern Japanese are completely unaware of the question-denying use of “mu”. She reported that she had to learn it from me!

Second, Wikipedia is missing one important vector of transmissions: Discordians, for whom “mu” seems to have have had its question-denying sense before 1970 (the date of the 4th edition of Principia Discordia) and from whom Pirsig and Hofstadter may have picked up the word. I suspect most contemporary usage traces through the hacker culture to the Discordians, either directly or through Hofstadter.

Regardless, it’s a useful word which deserves more currency. Sacred Chao says “Mu!” and so should you, early and often!

View more on Eric S. Raymond's website »

Like • 0 comments • flag

Published on May 11, 2015 05:44

Sometimes progress diminishes

It’s not news to long-time followers of this blog that I love listening to virtuoso guitarists. Once, long ago in the 1980s I went to see a guitarist named Michael Hedges who astonished the crap out of me. The guy made sounds come out of a wooden flattop that were like nothing else on Earth.

Hedges died a few years later in a car crash, tragically young, and is no longer very well remembered. But I was on IRC yesterday taking music with a friend who mentioned a harmonica and a whistler doing Jimi Hendrix in a “laid back, measured, acoustic style”, and I brought up Hedges because I remembered his cover of All Along The Watchtower as an utterly amazing thing.

Afterwards, in a mood of gentle nostalgia, I searched YouTube for a recording of it. Found one, from the Wolf Trap festival in ’86, and got a surprise.

It was undoubtedly very similar to the performance I heard at around the same time, but…it just didn’t sound that interesting. Technically accomplished, yes, but it didn’t produce the feeling of wonder and awe I experienced then. His original Because It’s There followed on the playlist, and held up better, but…huh?

It didn’t take me long to figure this out. It’s because in 2015 I’m surrounded by guitarists doing what Hedges was doing in the late 1980s. It even has a name these days: “percussive fingerstyle”, Andy McKee, Antoine Dufour, Erik Mongrain, Tommy Emmanuel; players like these come up on my Pandora feed a lot, intermixed with the jazz fusion and progressive metal.

Sometimes progress diminishes its pioneers. It can be difficult to remember how bold an artistic innovation was once we’ve become used to its consequences. Especially when the followers exceed the originator; I must concede that Andy McKee, for example, does Hedges’s thing better than Hedges himself did. It may take memories like mine, acting as a kind of time capsule, to remind us how special the moment of creation was.

(And somwhere out there, some people who made it to Jimi Hendrix concerts when they were very young are nodding at this.)

I’m here to speak up for you, Michel Hedges. Hm..I see Wikipedia doesn’t link him to percussive fingerstyle. I think I’ll fix that.

View more on Eric S. Raymond's website »

Like • 0 comments • flag

Published on May 05, 2015 11:07

Friends of Armed & Dangerous party

It’s Penguicon 2015 at the Westin in Southfield, Michigan, and time for the 2015 Friends of Armed & Dangerous party.

9PM tonight, room 314. Nuclear ghost-pepper brownies will be featured.

View more on Eric S. Raymond's website »

Like • 0 comments • flag

Published on April 24, 2015 06:49

A belated response to “A Generation Lost in the Bazaar “

Back in 2012, Poul-Henning-Kamp wrote a disgruntled article in ACM Queue, A Generation Lost in the Bazaar.

It did not occur to me to respond in public at the time, but someone else’s comment on a G+ thread about the article revived the thread. Rereading my reaction, I think it is still worth sharing for the fundamental point about scaling and chaos.

There are quite a lot of defects in the argument of this piece. One is that Kemp (rightly) complains about autoconf, but then leaps from that to a condemnation of the bazaar model without establishing that one implies the other.

I think, also, that when Kamp elevates control by a single person as a necessary way to get quality he is fooling himself about what is even possible at the scale of operating systems like today’s *BSD or Linux, which are far larger than the successful cathedrals of programming legend.

No single person can be responsible at today’s scale; the planning problem is too hard. It isn’t even really possible to “create architecture” because the attempt would exceed human cognitive capacity; the best we can do is make sure that the components of plannable size are clean, hope we get good emergent behavior from the whole system, and try to nudge it towards good outcomes as it evolves.

What this piece speaks of to me is a kind of nostalgia, and a hankering for the control (or just the illusion of control) that we had when our software systems were orders of magnitude smaller. We don’t have the choice that Kamp wants to take anymore, and it may be we only fooled ourselves into thinking we ever had it

Our choices are all chaos – either chaos harnessed by a transparent, self-correcting social process, or chaos hidden and denied and eating at the roots of our software.?

View more on Eric S. Raymond's website »

Like • 0 comments • flag

Published on April 16, 2015 17:57

Penguicon 2015!

I’ve been sent my panel schedule for Penguicon 2015.

Building the “Great Beast of Malvern” – Saturday 5:00 pm

One of us needed a new computer. One of us kicked off the campaign to

fund it. One of us assembled the massive system. One of us installed the

software. We were never all in the same place at the same time. All of us

blogged about it, and had a great time with the whole folderol. Come hear

how Eric “esr” Raymond got his monster machine, with ‘a little help from

his friends’ scattered all over the Internet.

Dark Chocolate Around The World – Sunday 12:00 pm

What makes one chocolate different from others? It’s not just how much

cocoa or sugar it contains or how it’s processed. Different varieties of

are grown in different parts of the world, and sometimes it’s the type of

beans make for different flavor qualities. Join Cathy and Eric Raymond for

a tasting session designed to show you how to tell West African chocolate

from Ecuadorian.

Eric S. Raymond: Ask Me Anything – Sunday 3:00 pm

Ask ESR Anything. What’s he been working on? What’s he shooting?

What’s he thinking about? What’s he building in there?

We do also intend to run the annual “Friends of Armed & Dangerous” party, but don’t yet know if we’re in a party-floor room.

“Geeks With Guns” is already scheduled.

View more on Eric S. Raymond's website »

Like • 0 comments • flag

Published on April 11, 2015 17:28

shipper 1.7 is released

I’ve released shipper 1.7. The main new feature in this release id that it now knows how to play nice with repository collections managed by gitolite and browseable through gitweb, like this one.

What’s new is that shipper (described in detail here shortly before I shipped the 1.0 version) now treats a gitolite/gitweb colection as just another publishing channel. When you call shipper to announce an update on a project in the collection, it updates the ‘description’ and ‘README.html’ files in the repository from the project control file, thus ensuring that the gitweb view of the collection always displays up-to-date metadata.

This is yet more fallout from the impending Gitorious shutdown. I don’t know if my refugee projects from Gitorious will be hosted on thyrsus.com indefinitely; I’m considering several alternatives. But while they’re there I might as well figure out how to make updates as easy as possible so nobody else has to solve this problem and everyone’s productivity can go up.

Actually, I’m a little surprised that I have received neither bug reports nor feature requests on shipper since issuing the beta in 2013. This hints that either the software is perfect (highly unlikely) or nobody else has the problem it solves – that is, having to ship releases of software so frequently that one must either automate the process details or go mad.

Is that really true? Am I the only hacker with this problem? Or is there something I’m missing here? An enquiring mind wants to know.

View more on Eric S. Raymond's website »

Like • 0 comments • flag

Published on April 05, 2015 04:50