Eric S. Raymond's Blog, page 56

October 28, 2012

The microzen: a unit of enlightenment

Earlier today one of my commenters caused me to realize that it would be entertaining to try to define a unit for the intensity of “aha!” experiences – moments of sudden insight.



In honor of said commenter (who, synchronistically enough, signs himself “Foo”) I define the “microzen” (μz) as follows: the amount of enlightement achieved when one realizes that “spinward” and “antispinward” are useful terms on planets as well as ringworlds. Because, well, global atmospheric circulation patterns – the context was a discussion of the incidence of cyclonic storms.


(I’d have preferred “microsatori”, but μs is taken.)


Of course, there’s a scaling problem here. Even if you have a good way to estimate relative magnitudes, you need two fixpoints to define a linear scale. (You in the back there just shut up about logarithmic already, I’m having to wave my hands hard enough as it is.) I therefore arbitrarily set 100 μz as the amount of aha required for somebody to write The Cathedral and the Bazaar.


Now, I hear you out there saying “You fool! That’s entirely too ill-defined!” But here’s my clever plan: if people have broadly similar intuitions about relative degrees of aha, we can crowdsource the problem! That is, we ask a bunchaton of people to consider some specific enlightenment experience – like, say, grokking how anonymous lambdas work in a functional-programming language – and rate that relative to our 1μz and 100μz scale pegs.


There you have it. Comments are open; let the crowdsourcing begin.

 •  0 comments  •  flag
Share on Twitter
Published on October 28, 2012 12:45

Storm warning

By now you’ve doubtless heard about Hurricane Sandy; the record-breaking superstorm hype has been pretty hard to miss. Well, I just got a look at the latest NOAA track projection, and it looks like the storm center is going to pass directly over my house sometime Tuesday night. The center track on that map couldn’t hit me more accurately if it had been aimed.



The good news is Sandy will have dissipated over about 60 miles of land by the time it gets here; NOAA is projecting only severe (39-73mph) winds rather than hurricane force. The bad news is…73mph winds and torrential rain aren’t anything to sneeze at. We’re on high ground and won’t be flooded out, but tree-fall damage is a distinct possibility and we’re pretty much expecting a power outage – the main question is whether it will last hours or days.


We’re battening down the hatches. Emergency food and water have been laid in, and we’ve arranged mutual retreat options with friends who live a couple of miles away on a different power subnet. We’re about as prepared as we can be short of boarding up the windows. I keep meaning to install a generator…


Wish us luck. This is probably going to be no more than inconvenient, but the potential for significant physical danger is definitely present. In particular, if the freak synergy with that cold air mass over the Appalachians pulls Sandy over land fast enough, it could still be a true hurricane-force storm when it gets here. That would seriously suck.

 •  0 comments  •  flag
Share on Twitter
Published on October 28, 2012 04:22

October 25, 2012

Announcing autorevision

autorevision extracts metadata about the head version of your repository. This program is meant to be used by project build systems to extract properties that can be used in software version strings. It can create files containing variable and macro definitions suitable for C, C++, sh, Python, Perl, PHP, lua, Javascript, and header files suitable for use with preprocessing Info.plist files.


This was a sort of spinoff from irker, though I’ve decided it not to use it there because I want to keep irkerd and irkerhook.py in single self-contained files.


No, I don’t know dak180′s real name. He’s a pretty good collaborator, though.

 •  0 comments  •  flag
Share on Twitter
Published on October 25, 2012 09:08

October 21, 2012

I hate having to be the heavy…

I nearly issued a forking threat a few minutes ago. Only the second time I’ve felt a need to do that and the first was in 1993, so this is not something I do casually. And I drew back from the brink.


But I may have to if the maintainer I’m dealing with doesn’t clean up his act. His library is critical to one of my projects, but his behavior has been increasingly sloppy and erratic lately. He made a serious design mistake which he’s been trying to paper over with kluges; the kluges have made the code unstable and the latest shipped version is actually broken to the point of unusability without a patch.



Some standards have to be maintained, and this guy is breaching most of them. I told him by email “you have set yourself up for serious public embarrassment, which I will (reluctantly) deliver if you don’t resume behaving like a responsible maintainer.”


I hope he gets the message…because I don’t want to threaten him with a hostile fork, but he’s backing me into a position where I think it may be my duty to aim that nuke at him. His library has other users, after all; he’s not just failing me but that whole community.


I’ll do what’s necessary…but I hate having to be the heavy. *Grumble.*

 •  0 comments  •  flag
Share on Twitter
Published on October 21, 2012 03:41

October 12, 2012

End-to-end arguments in software design

My title is, of course, a reference to the 1984 paper End-to-End Arguments in System Design by Reed, Saltzer, and Clark. They enunciated what has since become understood as perhaps the single most central and successful principle of the design of the Internet. If you have not read it, do chase the link; it well deserves its status as a classic.


The authors wrote mostly about the design of communications networks. But the title referred not to “network design” but to system design, and some of the early language in the paper hints that the authors thought they had discovered a design rule with implications beyond networking. I shall argue that indeed they had – there is a version of the end-to-end principle that applies productively and forcefully to the design of (non-networked) software. I shall develop that version, illustrating it with a case study from experience.



To apply the end-to-end principle to software design, we need a way to state it that is general enough to be lifted out of the specific context of network design. The network-design version is this: intelligence belongs at the network endpoints, not in the pipes. Trying to make the pipes “smart” duplicates functions like error correction and receipt acknowledgment that the endpoints are going to have to do anyway to ensure end-to-end integrity. Trying to make the pipes smart also introduces tricky failure modes when the smarts inevitably go wrong.


Now I’m going to tell a story about a software failure. As I write, I have spent most of the last two weeks spinning up a replacement for a widely-used social-computing service that irreperably crashed on us. The replacement involves a service daemon I named “irkerd”, which is expected to relay notification requests submitted as simple JSON objects on a listening socket to Internet Relay Chat channels.


To do this, irkerd (which is written in Python) relies on a Python IRC library designed to speak the client end of the message protocol defined by RFC2812. The library is pretty good; without it, the irkerd implementation would have taken much, much longer. In turn, irkerd has stressed the library in ways it hadn’t been stressed before. I’ve been contributing fixes and patches back to the library, and the maintainer has shipped a couple of point releases as a result.


Yesterday, just as the irkerd codebase was stabilizing after some early problems with thread safety, some of my test users on the irker chat channel began reporting a new fatal bug – a Unicode decoding error being thrown from deep inside the IRC library. Investigation and an email query to the maintainer revealed that this was the result of a recent design decision and a consequent change to the library internals in the previous day’s point release.


Previously, the IRC library had made no assumptions about the character encoding of the chat data it received from IRC servers. It simply passed those strings as uninterpreted payloads of events made visible by the library to the calling application (in this case irkerd). Because irkerd is a sending relay rather than an interactive client, it just threw that received chat data away. The only received traffic it cares about is the IRC server’s responses to login and message-transmission commands, which are plain ASCII.


In this point release, the maintainer changed the library to perform UTF-8 decoding early in the processing of the chat strings, so the event payloads would be guaranteed Unicode. Which was fine and dandy until a server shipped irkerd a chat line with a bad continuation byte. At that point the early UTF decode threw an exception from deep inside the library, crashed one of irkerd’s main threads, and hung the daemon.


The library maintainer thought he’d be doing calling applications a favor by performing UTF-8 decoding so they don’t have to. What he did instead was introduce a new fatal failure mode on bad chat data – a particularly annoying one for irkerd, which only sees that data because the protocol requires it to, and really wants to just ignore the chat lines.


Assumptions about character encoding properly belong not in the IRC library but in the calling application. There are several reasons for this, but they all come down to two points: (1) only decoding exceptions raised locally can be handled locally, and (2) how to handle them is a policy decision that the application must make – so the library should not try to pre-empt it.


The library maintainer violated a software version of the end-to-end principle. It reads like this: in any software data path, whether networked or not, interior components that can be indifferent to the nature of the data they are handling should remain indifferent. Every assumption introduced while handling data implies new exceptions and failure cases; to minimize your failures, minimize your assumptions.


I will finish by noting that I wrote this essay because web searches on “end-to-end arguments” and “end-to-end principle” suggested that the above software-generalized version of it may not have been written down before – all the references I could find were about network design. I believe that this is one of those folk theorems that every sufficiently experienced software system designer eventually learns without necessarily becoming conscious about it. I hope that by writing it down, so it can be learned more rapidly and explicitly, I will have helped improve the practice of software design.

 •  0 comments  •  flag
Share on Twitter
Published on October 12, 2012 05:13

October 7, 2012

Adventures in kuntao

My regulars will be aware that, since the Mixed Martial Arts program we were in folded up, my wife Cathy and I have been having an interesting learning adventure checking out various schools in our area as possibilities for our next style. We’ve had some more adventures since.



We did indeed visit the Northern Shaolin school I alluded to in my last installment. But we were not impressed. The forms we saw were pretty, but the movements seemed less practical for combat than what they were doing at the first Shaolin studio we visited, in Berwyn. And I saw no evidence of contact sparring there, either.


We’ve reluctantly given up on the Systema guy. We’d love to train with him – we liked both his technique and his teaching style – but he only teaches one night a week and that night would conflict with Cathy’s Borough Council meetings half the time.


However, one of the people at the Northern Shaolin school mentioned the existence of a school of Philippine martial arts in Phoenixville, which is just within reasonable driving distance of us. This caught our interest, because (a) we’ve done a little training in Philippine stick-fighting and enjoyed it, and (b) the Philippine arts have a well-earned reputation for brutal practicality. The Phillipines was and still is an extremely violent place, between criminals and pirates and several simmering insurgencies.


What we found, in a drab concrete building in Phoenixville, was most interesting. It’s a style called kuntao (Hokkien Chinese for “way of the fist”) that blends Southern Chinese kung fu with native Filipino blade and stick techniques. Developed by emigre Chinese in the Indonesian and Philippine archipelagos, it’s a rare art in the U.S. – I’d never heard of it before this – with only a handful of schools here.


Five minutes into the weapons drills I whispered to Cathy “These people are serious!” and she nodded definite agreement. Most (though not all) of the students actually moved like fighters, with real intention behind their strikes. I noticed one in particular because the quality of his movement was both forceful and amazingly fluid, almost dancelike in a way I’ve seen before from really advanced Filipino players. Analyzing his movement, I had a sudden realization.


I’ve read a fair amount of theory about Philippine arts, and one of the core concepts is most of them is what’s called “live hand”. The “live hand” is the one without the weapon, and the idea is that it’s actually supposed to be the more dangerous – trapping, blocking, and setting up kills for the weapon hand.


My realization was “This is what ‘live hand’ looks like!” Regardless of which hand he was striking with, both sides of this student’s body were fully involved in every move. The live hand was constantly searching for openings, presenting a threat, or at least moving in opposition to put more power into the “dead” hand’s strikes. When I quietly pointed this out to Cathy, she grinned and informed me that she believed I was looking at the principal instructor’s son. So indeed it proved.


The inventory of techniques we saw wasn’t too surprising given their blend of influences. The empty-hand moves are mainly from wing chun, which we’re somewhat familiar with from previous study. We saw, as expected, kali with both single and double sticks. We also saw quite a bit of knife work. Weapons handy but not lifted on this particular evening included six-foot staff and machete.


I’m by no means a bad hand with a knife, but the kuntao technique I saw made me feel like mine is crude – not ineffective, necessarily, but certainly not at their level of precision and artistry either. Perhaps not surprising since what I was trained in was based on what the military can teach in a 12-week training cycle at U.S. Marine boot camp. It would be good to learn what the kuntao people know if only so we can take it back to our sword school.


The quality of teaching looked high; I saw a lot of initiative and mutual help among the students. My only reservation was about doing stretches on that cold concrete floor…Cathy and I walked out of there with an excellent impression of the place. We’ve been invited to actually do a sample class next week, and we will.


It’s down to either Mr. Stuart’s for Israeli military kickass or this for exotic Oriental deadliness straight out of a Sax Rohmer novel. We’re leaning towards kuntao, if only because we both think stick-fighting is really cool. The final decision will probably be next week.

 •  0 comments  •  flag
Share on Twitter
Published on October 07, 2012 18:50

October 6, 2012

irker is feature-complete

I’ve just shipped irker 1.8, and I think this brings the wild ride I’ve been on for the last eleven days approximately to a close. I consider this release feature-complete; it achieves all the goals I had in mind when the CIA service died and I decided it was up to me to rescue the situation. I expect the development pace to slow down a lot from the almost daily release I’ve been doing.


The last really major feature was irkerhook support for Mercurial repositories. I’d be mildly interested in a bzr extractor class if anyone wanted to contribute one, and that probably wouldn’t be hard – the git and hg extractors are about 70 lines each. But with git, hg, and Subversion covered it’s good enough.


Uptake of irker continues at a pleasingly rapid pace. There’s now a second symbiote application, a poller daemon that watches the log of a specified Subversion repository and uses irkerd to ship notifications from it. This can be useful if you don’t have write access to the repo hooks and thus cannot install irkerhook.


Time for a pause and some reflection on lessons to be learned.



I think the most interesting aspect of the project, after I got the basic irkerd design in working shape, was hardening the implementation against denial-of-service attacks. I got some high-powered volunteer help with this, including from A&D regulars Daniel Franke and Peter Scott. Reasoning out the attack paths and the simplest possible countermeasures was just plain fun, possibly more fun because I’ve never had to do this kind of analysis before.


The willingness of hackers to step up and contribute to a project like this continues to be a wonderful thing about which I hope never to become jaded. Because this project spun up so fast and still doesn’t have a mailing list, I know many of my contributers mainly by IRC nicks. Thank you, AI0867, birkenfeld, KingPin, laurentb, dak180, nenolod, and everybody else who showed up on #irker to help and contribute and critique and report bugs and ask questions. It has been a pleasure working with all of you and watching a healthy micro-community form around this project within hours of my first release.


irker does have some has competition for its goal of replacing the CIA notification service. Some Debian people dusted off a project called “KGB”; I gave its principal designer a bit of a hard time when he showed up in A&D’s comments because KGB is heavier and more elaborate than it needs to be, but in truth it’s not actually horrible. CIA’s spaghetti architecture was horrible; KGB is merely somewhat overweight.


KGB is also, judging by the traffic on freenode #commits, losing the adoption race. They got to the party late with software that’s more difficult to grok and get running than irker is. Though, to be fair, irker’s minimalistic design might not have gained an advantage without also having better documentation. I never consider high-quality documentation a mere optional extra on my projects. This is certainly a lesson more developers could stand to learn…


I will admit, however, that one KGB feature I initially scoffed at turned out to be sufficiently useful and lightweight that I added it – that is, support for color-highlighting notication lines. The key to that decision was that I found a way to implement it in a handful of lines of code in irkerhook.py; the feature doesn’t touch irkerd at all.


This is an example of a larger theme in irker’s design – policy/mechanism separation. The irker daemon is pure mechanism; it has no options or control knobs of any kind (other than one to enable debugging messages and another to dump the version and exit). It’s just a message bus. All the policy stuff (choices about what to put in a notification) lives in irkerhook.py. And you damn betcha that I plan to keep it that way!


Because all irkerhook.py does is gather information that it ships as JSON, the addition of one simple option – to dump the JSON to stdout rather than trying to ship to the configured irker instance – makes changes in the the policy stuff very easy to test. Which is as unlike the huge, nigh-untestable and thus extremely failure-prone hairball that was CIA as it is possible to be and do an even remotely similar job.


Generally, policy needs to change more often than than mechanism. So, when you partition your system into a mechanism part (irkerd) and a policy part (irkerhook.py), the policy part is the part where there is greater need for testability. Or, to put it differently: when you partition your code and you find that the least stable part is most amenable to testing, you’re doing it right.


Yes, this is the old-time Unix gospel of minimalism and design for testability I’m preaching here, brothers and sisters, and I’m doin’ it for a reason. The total line count of irkerd and irkerhook.py code is 689 LOC (measured by sloccount), compared to 1957 LOC for KGB and probably tens of thousands of lines for CIA (I haven’t measured that). When you pay proper attention to separation of function and separation of policy from mechanism, your code gets not larger but smaller. With virtuous consequences, the most important of which is fewer failure modes.


Comprehensibility helps deployment, too. It’s not that I think every project administrator who has adopted irker has actually read the daemon code, but I’m certain its smallness and lightness – and the fact that you can read through irkerhook.py and grok what what it’s doing in just a couple of minutes – has made it an easier sale to people who are chronically short of time and attention to get all their tasks done.

 •  0 comments  •  flag
Share on Twitter
Published on October 06, 2012 19:33

October 3, 2012

How not to engage me

Considering the extent to which I’m still a public figure, it is perhaps surprising how seldom I get email that deserves a thorough, up-one-side-and-down-the-other flaming. I got one today which I shall reproduce here as a perfect example of how not to engage me.




I just finished skimming through your guide: “How To Ask Questions The Smart Way”.


I am sorry to say I am alarmed by the kind of immature mentality that runs through it.


I will not begin to analyze how many of the assertions you make are simply illogical and unjust, because it upsets me to think about them.


I would like to ask you to consider the reasons why you are taking this stance. If it’s out of sympathy for the superior caste of ‘hackers’, an idiotic and misapplied term if ever there was one, do you realize that you are describing real people with serious psychological problems, and that you are feeding their pain by your writings?


I hope you can balance the effort you have put behind this document with some social responsibility,


I swear to all of you I did not invent or modify even a word of this. I wouldn’t have been capable; I can’t simulate galloping stupidity that well. Here was my response:



Thank you, this email is easily the most unintentionally risible thing I’ve read in the last week. I could say I’m “alarmed” by your appalling ignorance and more-concerned-than-thou condescension, but I’m laughing too hard to be alarmed.


“Upsets me to think about them”, eh? Poor, poor, fluffy, *precious* you. I’m positively vibrating with sympathy. Not. And, oh look, you used the magic cant phrase “social responsibility” – a sign infallible that the speaker is either a tender-minded idiot or a manipulative thug. In your case my money is definitely on tender-minded idiot.


Actually, if you had labored for weeks with the conscious intention of writing something that would earn my derision and contempt, you could hardly have done better than this.


I might have to post this outpouring of yours on my blog as a perfect example of Not Having A Clue. But I won’t attach your name to it; I’m not cruel enough to expose you to the public mockery that would ensue.


In case the lesson isn’t clear, my automatic response to attempts at moral bullying is “Fuck you and the pretensions you rode in on.” If you want to get my attention in any but the most negative way, don’t even try it.

 •  0 comments  •  flag
Share on Twitter
Published on October 03, 2012 20:08

September 29, 2012

irker takes off like a rocket

It was just three days ago that I shipped irker 1.0, but the project is already a huge hit out there in hackerland. It’s clear from traffic on the freenode #commits channel that irker installations are springing up everywhere. There’s already one symbiote, a proxy that takes XML-RPC requests in the CIA format and passes them to an irker instance (you have to supply your own mapping of projects to IRC channels for it to use). And at least one custom hook already written and in production – by the Python development list, as it happens.


I’m a bit boggled, actually. I don’t think I’ve ever had a project go from launch to all over the freakin’ landscape this fast before. Guess that’ll happen when you step up with a clean replacement for a service that lots of people were habituated to and have suddenly lost.


There’s more work to be done, of course. (There’s a public repository, and an #irker IRC channel, for people interested in following development.)



A&D regular Daniel Franke did a really good security-vulnerability analysis, which I’ve expanded on and is now in the repository. Launching from that I just added some DoS prevention to the code.


The repo hook component needs work as well (this is the Python script you make your repository’s post-commit hook call in order to generate a notification). At present it has support for git and Subversion; it could support Mercurial, CVS, and other version-control systems as well. An important feature of the code is that the VCS-dependent stuff lives in extractor classes well separated from the generic stuff; thus, adding support for more VCSes will be easy when someone steps up to do it.


I’m also a little worried about the multithreading – a technique I normally fight shy of because it’s so prone to subtle race conditions, but it had to be done here. It seems to work, but…if any of you reading have experience at reviewing this kind of design, please critique mercilessly.


We’ve had one annoying deployment issue. Some people have been reporting irker crashes on session disconnect due to a bug in the stale version of irclib up on SourceForge. The actual project home has moved to PyPI, the Python Package Index, but the maintainer hasn’t gotten around to updating his documentation and web pages yet. If you want to run this code, get the PyPI version of the IRC library to do it with.


But, these relatively minor issues aside…three cheers for classic Unix minimalism! Total LOC of the codebase is just 503 lines exclusive of comments. I chased St. Exupéry’s definition of perfection (“…when there is nothing left to take away.”) pretty hard this time, and it seems to have worked out well.


UPDATE: I just shipped 1.2.

 •  0 comments  •  flag
Share on Twitter
Published on September 29, 2012 23:45

September 27, 2012

CIA and the perils of overengineering

The CIA commit-notification service abruptly died two days ago, a development that surprised nobody who has been paying attention to the recent history of the codebase and its one public server site. A screwup at the cloud service hosting the CIA virtual machine irretrievably destroyed the instance data; please don’t ask me for details, I don’t know how it happened and don’t care. The CIA codebase is so screwed up that even reconsituting a virgin instance would be way too much work – and that I will talk about a bit later in this post.


Fortunately, I saw this coming and had started work on a CIA replacement in late August. I had been holding off releasing it because there was some effort going on to salvage the code, but that possibility effectively vanished when the only instance was erased. I shipped my replacement just a few minutes ago, and expect to spend much of the next week helping forge-site operators install it so we can have our notification service back.


The remainder of this post is a finished version of a design analysis of CIA I started a couple of weeks ago when the death of the service was still only a theoretical possibility. Since that theory has become actuality, the message should be heard loudly and clearly: this was a truly classic case of over-engineering, code bloat, excessive centralization, and bad practice. Read on for the cautionary tale.



I always liked the idea of CIA, the “version control informant” that relays commit notifications to IRC channels. I’ve been maintained the (now obsolete) git hook scripts that talk to CIA for several years now. But recently I have been looking more closely at the design of CIA and how it’s implemented, and have concluded that it was a pretty horrible example of how not to do things.


First, a review of what CIA did and how it did it. What you saw from the outside when a CIA setup was working for a project was simple: whenever a developer commited code to the project’s public repository, the commit summary was shipped to an IRC channel associated with the project. It became part of that conversational stream, and was also echoed to a special channel (#commits on freenode) where you could watch all the open-source world’s commits flow by like a river.


A notification service like this is a very useful aid to collaboration. It makes IRC conversations among a development group more productive. It also does something unquantifiable but good to the coherence of the development groups that use it, and the coherence of the open-source community as a whole – when the service was live it was hard to watch #commits for any length of time without being impressed and encouraged.


Looking a little deeper, here’s what happened when a commit was made. The repository’s checkin procedure fires a “commit hook” – a small program, usually written in shell or Perl or Python – that is passed various metadata such as the commit’s ID, its list of files modified, and its change comment. The hook assembled an XML message in a particular format containing this information. It then used XML-RPC to call a central CIA server at cia.vc and ship it the notification.


The CIA server was then responsible for turning the XML notification into a text line that got shipped to the project channel and to #commits. It also updated a bunch of statistical summaries that could be browsed at the CIA website (now defunct).


Unfortunately, as in the old proverb about law and sausage, those who loved CIA notifications were best advised not to look too much more closely than this at how they were made. The service was notoriously subject to random outages and stalls; but that, bad as it is, is only symptomatic. Underlying this were several layers of unfortunate history, poor design decisions and shoddy implementation.


CIA hadn’t been actively maintained in several years before its collapse – the originator, one Micah Dowty, disappeared around 2007. One Karsten Behrmann, aka “BearPerson”, stepped in around 2008 but was unable to solve the problems with the software. The sole running public instance was hosted by a third party prone to loudly complaining on the #cia channel that the host box was an insecure hairball full of flaky and obsolete software that he couldn’t fix because the CIA code had dependencies on now-obsolete software versions.


That running instance is what’s now vanished. If you examine the repo of the CIA software, you’ll discover that it’s a mixture of parts in mostly Python but some Erlang, using (a) a custom web framework, (b) some Twisted, and (c) some Django. I’m told by people who have examined all this more closely than me that the individual subsystems (such as the Django code that generates most of the visible web pages in the site) aren’t too bad, but the interactions among them are messy and leaky.


The more experienced software engineers in my audience will already be getting a clue to what went wrong here, if not yet quite why. This is what software that has undergone a collapse into rubble under the weight of its own complexity looks like, complete with maintainers who have run away from their own inability to manage the resulting mess.


But the indictment wouldn’t be complete without noticing that their development practices sucked, too. My hair stood on end when BearPerson let drop on the #cia channel that the code in the running instance didn’t match the head state of the project’s CIA repository on GoogleCode – he admitted he’d “been lazy” and patched things on the site without propagating the changes back to the repo. I nudged him into fixing this. Or, at least, claiming to have fixed it, but the historical record did’t do a lot to reassure me on that score. And it’s why those close to the problem have given up on attempting to resuscitate CIA without a running instance to look at.


Yes, before the VM was wiped there was a crew on the #cia channel trying to salvage the codebase. I helped a bit on this, but my estimate of their odds was never very optimistic. It is notoriously difficult to un-collapse a rubble pile, especially when the author and one previous rescue attempt (BearPerson’s) have already manifestly failed. Thus, I directed most of the limited energy I could spend on this problem into a different strategy.


That strategy began with asking why CIA suffered a complexity collapse, and whether much simpler code could do the job its users expect of it.


There are several aspects of the design that seemed rather iffy. Why one centralized server? Why the elaboration of XML-RPC? Why do you have to register your project on cia.vc to use the notification, with the mapping from your project to IRC channels lurking in an opaque database on a distant server, rather than being simply declared in the (arguments to your) repository hook?


The answer seems to be that the original designer fell in love with the idea of data-mining and filtering the notification stream. It is quite visible on the CIA site how much of the code is concerned with automatically massaging the commit stream into pretty reports. I’m told there is a complicated and clever feature involving XML rewrite rules that allows one to filter commit reports from any number of projects by the file subtrees they touch, then aggregate the result into a synthetic notification channel distinct from any of the ones those projects declared themselves.


Bletch! Bloat, feature creep, and overkill! With chrome like this piled on top of the original simple concept of a notication relay, the resulting complexity collapse should no longer be any surprise. Additionally, this is a near-perfect case study in how to make your service scale up poorly and be maximally vulnerable to single-point failures – if that one database gets lost or corrupted, everybody’s notifications will go haywire. The design would have been over-centralized even if the implementation weren’t broken.


Of course the way to prove this kind of indictment is to do better. But once I got this far in my thinking, I realized that wouldn’t be difficult. And started to write code. The result is irkerd, a simple service daemon. One end of it listens on a socket for JSON requests that specify a server/channel pair and a message string. The other end behaves like a specialized IRC client that maintains concurrent session state for any number of IRC-server instances. All irkerd is, really, is a message bus that routes notification requests to the right servers. (And is multithreaded so it won’t block on a server stall, and times out inactive sessions.)


That’s it. Less than 400 lines of Python replaces CIA’s core notification service. The code for a repo hook to talk to it is simpler than any existing CIA hook. And it doesn’t require a centralized server. The right way to deplay this thing will be to host multiple instances of irker on repository sites, not publicly visible (because otherwise they could too easily be used to spam IRC channels) but available to the repository’s hooks running inside the site firewall.


Filtering? Aggregation? As previously noted, they don’t need to be in the transmission path. One or more IRC bots could be watching #commits, generating reports visible on the web, and aggregating synthetic feeds. The only agreement needed to make this happen is minimal regularity in the commit message formats that the hooks ship to IRC, which is really no more onerous than the current requirement to gin up an XML-RPC blob in a documented format.


I must note one drawback to this way of partitioning things. Because IRC has a message length limit, naively shipping commits with very long metadata (due to for example, large lists of modified files) would make only a truncated version available on IRC (and thus, to an IRC warther bot gathering statistics).


It might be that this was the original motivation for using an XML-RPC transport on CIA’s input end. Indeed, when I first recognized the problem I started sketching a design for a auxiliary daemon that would do nothing but accept XML-RPC requests in something very close to CIA’s preferred format, then forward short digests of them to an irker instance for shipping to IRC. This auxiliary could collect statistics based on the un-truncated metadata…


Fortunately, I experienced a rush of good sense before I actually started coding this thing. It would have hugely complicated deployment and testing to handle an unusual case – observably from #commits, most commit messages are short and touch few files. We get a much simpler system if we accept two reduction rules:


1. If a commit notification would be longer than 510 bytes, we omit the filenames list. An empty filenames list is to be interpreted by filtering software as “may touch any file in the project”.


2. Then…we just ship it. If the IRC server truncates it at 510 bytes, so be it. Humans watching the commit stream won’t need more than that to put the commit in context (especially not for projects which use git’s first-line-is-a-summary convention) and the hypothetical statistics-gathering bots won’t understand natural language well enough to care that it’s truncated.


This is how you keep things simple. And that is how you prevent your projects from collapsing under complexity.


I wrote irkerd to accomplish two things: (1) Light a fire under the CIA salvage crew, attempting to speed up their success, and (2) provide a viable alternative in case they didn’t succeed. To this I now add (3) illustrate what healthy minimalism in software design looks like. Antoine de St-Exupéry said it best: Perfection (in the design of software, as well as his airplanes) is achieved not when there is nothing more to add, but rather when there is nothing more to take away.


Accordingly, note the nonexistence of irkerd configuration options and the complete absence of anything resembling a control dotfile. I even, quite deliberately, omitted the usual option to change the port that irker listens on. Because if you think you need an option like that, you actually have a problem you need to solve at your firewall.


But releasing irkerd, of course, is not the end of the story. For it to do any good, instances of the daemon and its repo hook will need to be running and documented at sites like SourceForge, GitHub, Gitorious, Gna, and Savannah. As I noted at the beginning of this essay, I expect pushing the deployment along will eat up a lot of my time in the near future – probably more time than it took to write and test the code. These forge sites are all chronically understaffed and have long issue backlogs.


Still, at least we now have a simple and robust design, and working code. And – this can’t be emphasized enough – single-site outages will no longer be fatal. If there’s one thing the history of the Internet should have taught us, it’s that you get robust and scalable services not by centralizing but by distributing them. It’s too bad the designer of CIA never internalized that lesson, and there can be no better finish to this tale of failure than by reinforcing it.

 •  0 comments  •  flag
Share on Twitter
Published on September 27, 2012 15:37

Eric S. Raymond's Blog

Eric S. Raymond
Eric S. Raymond isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.
Follow Eric S. Raymond's blog with rss.