Eric S. Raymond's Blog, page 10
June 25, 2018
How to make The Breakfast
I wouldn’t have posted this if the comment thread on “The sad truth about toasters” hadn’t extended to an almost ridiculous length, but…
I dearly love classic American breakfast food. I delight in the kind of cheap hot breakfast you get at humble roadside diners. I think it’s one of the glories of our folk cuisine and will cheerfully eat it any time of the day or night.
I posted a fancy breakfast-for-two recipe a while back (Eggs a la ESR). What follows is the slightly plainer breakfast I make for myself almost every morning. It’s the stable result of a decades-long optimization process – I haven’t found a way to improve it in years.
Ingredients:
* Two large eggs. Ideally from chickens fed with bugs, not grain; makes a difference.
* Four strips of lean, thick-cut bacon.
* A slice or three of onion (variation in amount doesn’t matter much).
* Half of a small lime.
* Tuscan garlic boule. Get it at your local Wegman’s; it’s a respectable supermarket imitation of Italian artisan sourdough bread. The garlic adds savor. If you can’t get this exact bread, make it some sort of sourdough.
* About an ounce of dark chocolate.
* Butter/olive-oil spread blend. Land-O-Lakes is a good one; there are others.
Tools:
* One heavy 12.5-inch diameter non-stick skillet. (I’d use a real iron skillet, but cleaning and maintenance on those is a PITA so I reserve that for special occasions.)
* One non-crappy toaster set to golden-brown level.
* One 22-ounce pub mug.
* Generic cutlery (bread knife, chef’s knife).
* Wooden cutting board.
Procedure.
The reason I specify the step order I do is that this optimizes prep time. The whole prep and cleanup takes less than 6 minutes. But linger over the breakfast itself, it’s worth it.
Pull the buttery spread out of the fridge so it’ll have time to soften while you cook.
Lay the bacon in the skillet; start cooking it on low heat. I use one of those mesh grease-shield things over the skillet to contain spatter. It may also reflect enough heat to improve the bacon slightly.
Cut a generous slice of boule, cut it in half, and drop the half-slices in the toaster. Don’t start the toaster yet.
Haggle-cut the onions. Do this a little roughly so you get variation in texture; this gives a more interesting result than mathematically-precise dicing. Do this after cutting the bread, not before, as you don’t want moisture from the onions on the uncut part of the boule you’re going to put away.
While the bacon is cooking, fill the pub mug with cold filtered water and squeeze the half-lime into it. Don’t drop the lime in, otherwise you’ll get bitter oils from the lime in your water. A bit of this would be OK, but if you let the lime sit in the water too long the effect can be quite unpleasant.
Allow the bacon to cook until chewy and slightly crisp around the edges. Set it on a paper-towel-covered plate to drain. Either roll up the paper towel with the bacon inside it or put an inverted plate over it to slow down heat loss.
Put the onions in the skillet. Stir them around a bit so they pick up some bacon grease. Turn up the heat slightly. Push the onions aside to make room for the eggs.
Crack the eggs into the skillet and break the yolks. I like to keep them from running together to make later flipping easier.
Note that a bit of delay between adding onions and eggs is no bad thing when you’re not in a hurry. Giving the onions another 10 seconds on their own increases the odds that you’ll get some caramelization going.
Allow the eggs to cook until their top sides start to bubble just a bit. Flip them over; turn off the heat, the rest of the cooking is courtesy of the pan’s thermal inertia.
Now start the toast. Your objective is for it to pop just after you get the rest of the food on the table.
When the eggs look cooked, plate them and the onions and the bacon. If you timed things right, the onions have a bit of browning/caramelization around the edges.
Lightly salt the onions. Add hot-sauce of the day to the eggs. Most days this is plain old Tabasco, but I do vary it.
Your toast should pop about now. Butter and lightly salt it.
Bon appetit! Finish with the chocolate and the last of the limewater.
Notes:
By the time you’re done eating the skillet will have cooled enough to make it easy to rinse out. I’m not fanatical about fully washing it; if the bacon-grease buildup is light and there isn’t obvious carbon, that’s just flavor for the next day.
I used to do fresh mushrooms rather than onions and would still prefer that, but gave it up because they left heavy carbon deposits even on a non-stick pan.
The chocolate isn’t just there because I like it; it’s a neuroprotective, a pleasant way to decrease my stroke risk. I think this more than offsets the risk from the sugar. A good winter alternative is to have a cup of hot chocolate with The Breakfast – I favor Godiva Dark.
Cheaping out on the bacon probably hurts this composition more than anything else you could do to it while leaving it recognizable. Don’t do that. Good meaty bacon anchors the whole thing.
On the other hand, using bottled lime juice is not a sin against this recipe. Fresh-squeezed is better but not the really dramatic improvement it would be with respect to orange or grapefruit juice; it might be that most people wouldn’t taste a difference.
The Breakfast is a sort of anchor ritual for my day. I frequently deal with a lot of challenge and novelty, by choice. It’s good to have a fixed point in my routine, and this is it.
If you are very lucky, consumption of The Breakfast will be regularly accompanied by leg stropping and purrs from a friendly orange Coon-cat who considers this an important part of his morning ritual.
June 24, 2018
The sad truth about toasters
I bought a toaster today.
I didn’t want to buy a toaster today. About ten years ago I paid $60 for what appeared to be a rather high-end Krups model in accordance with my normal strategy of “pay for quality so you won’t have to replace for a good long time”, an upper-middle-class heuristic that I learned at my mother’s knee to apply to goods even as mundane as light kitchen appliances.
I had reason for hope that I would get a well-extended lifeline for my money. I recalled the toasters of my childhood, chrome and Bakelite battleships one just assumed would last forever, being passed down generations. “Luke, this toaster belonged to your father…an elegant weapon from a more civilized age.”
Alas, it was not to be.
The Krups, though it appeared well-constructed, had a fatal design flaw I had not seen in any toaster of my previous acquaintance. It was a misfeature of the dingus I thought of as “the slider”, a pair of aluminum slats with protruding little flaps on which the bread actually sits inside the slots. This is the interior part attached to the spring-loaded thingummy you push down to begin the toasting operation.
The slider was (a) made of excessively thin aluminum or mild steel, and (b) further misdesigned so that it would sometimes jam in the guide channel on the rear end of each toaster slot (intended to guide it straight up and down) and bend slightly. After a couple of these regrettable incidents, the tongue on the rear end of the slider would pop out of the guide channel and jam against the rear wall of the slot, with the slider compressed into an unpleasing serpentine shape.
I soon lost count of the non-enjoyable hours I spent fishing inside the slots with a pair of long-nosed pliers, forcing the slider tab back into its guide channel and then trying to hand-straighten the slider, cursing sulphurously the while. I would always succeed to a degree, but it was never possible to completely de-serpentinize the sliders; the result would remain ever so slightly more hinky than when I started. I could always see the day coming when the sliders would become so spavined that further intervention would avail me not.
That day was today; specifically during my wife’s Cathy’s attempt to fix her breakfast. Mine, a half-hour earlier while she slept in, had gone fine, but… “Love, would you please fix the toaster?” I got all armed up to be the hero again, but struggle as I might, I couldn’t unjam the device. Der Tag had finally arrived.
So we hied ourselves to the local Bed, Bath & Beyond. And now unfolds the burden of my tale. For there I was again faced with the choice of where to land on the price curve. There was an array of toasters before me priced from $19.99 to over $100. Go high, hoping to buy longevity, or go low and treat the thing as a throwaway?
And then, and then, I had a rush of engineer to the brain and realized that what I ought to be doing is inspecting the part that failed – the sliders – on all of them, and choosing for the most rugged-looking one not mounted in a toaster with glaringly obvious defects. So, I narrowly inspected the sliders on, I think, seven toasters – several major brands and one obvious house brand.
How did I know what I was looking at? Why, a quick look at the Consumer Reports website on my smartphone. Which told me the assemblage before me included at least two models with high ratings.
My friends, the sliders were all effectively identical, bar the one pair that was slightly longer to fit a toaster with extra-long slots. Same nondescript thin white metal, same shape and spacing of flaps. They looked like they all might have been designed on the same CAD program and built by the same no-name appliance OEM (“Welcome to the Malaysian assembly plant of Oh Shit Housewares, HQ in Taiwan”).
Suspicions aroused, I flipped over several toasters and peered at the end consumers don’t usually look at. A lot of the design details on the plastic bases were pretty similar, also the fasteners looked pretty much alike.
The sad truth about toasters: they’re completely genericized, built strictly for lowest Bill of Materials to standard designs that are differentiated by surface details that don’t have a lot to do with their actual function. I’m now convinced that I would have seen little more variation anywhere I went unless it was to a specialty store catering to the restaurant and luxury trade at $$$ high prices.
It’s no great mystery why this happened. Efficiency is a harsh mistress – I mean look at the automobiles one sees on the road these days. Within any given form factor they’ve become mostly nigh-indistinguishable except by paint job and some trim details. As with toasters, there was more variation in product when I was a kid.
So, if I were an idiot, I might launch into a rant about how capitalism flattens everything into cheap popular mediocrity. I’m not an idiot; that process is why nominally poor people in the U.S. can have a lot of nice things. It’s a good trend, as long as specialty providers for the discriminating still exist. And that’s a viable market niche, too.
I shrugged and bought the cheapest possible toaster – $16.45 with discount. Took it home. Checked that it would deal with thick slices of the Tuscan garlic boule I eat with breakfast. It works fine.
Now I’ll go research where to buy a really durable one when this throwaway craps out. Suggestions welcome.
June 23, 2018
Respect can be hard
I had a good sword class today. There was much sparring with many different weapons.
At one point, Sensei Varady and I faced off, him with paired shortsword simulants, me with a longsword simulant. It went pretty well for me; sensei is bigger, faster, at least 20 years younger, and more skilled than I am (he runs the school), but he kept letting his guard fall just a _little_ bit too low and I got in three or four kill shots on the left neck-and-shoulder pocket – a favorite target of mine at any blade length.
Understandably this adrenalizes him some and the next time by block is not quite fast enough he fetches me a whack on the ribs that he immediately realizes was way overpowered. Starts apologizing, grins, and says something like “You’re a good, tough fighter, I tend to power up automatically to deal with it.” (Sense exact, words not.)
I said “I am very happy that you treat me with that respect.”
And absolutely meant it. Given a choice between taking a bruise occasionally because the instructor has to play hard to beat me and being ineffectual enough that he always has control of the fight…I’ll take the bruises, thanks. And enjoy how I got them. A lot.
June 17, 2018
The critical fraction
I’ve seen analyses of the long odds the U.S. government would face if it ever attempted to confiscate civilian firearms before. The Mathematics of Countering Tyranny seems like a particularly well done example.
The authors compute that under very generous assumptions there are about 83000 door-knockers available to perform confiscation raids. Dividing that into the estimated number of semiautomatic rifles in the U.S. and assuming that each raid would net three rifles confiscated (which I think is optimistic in the raiders’ favor) each doorknocker would have to execute and survive 864 raids in order for the entire stock of rifles to be seized.
Notice that we’re not even addressing the far larger stock of handguns and other weapons yet. But I’m willing to tilt the conditions of the argument in the confiscators’ favor, because that makes the conclusion more difficult for them to rebut.
There’s a different way to slice these numbers. Applying the 3:1 force ratio military planners like to assume, this means the number of violently resistant gun owners – people willing to shoot a doorknocker rather than watch their country sink into tyranny – needs to be about 249000.
Is this a plausible number?
The NRA has about 5.2 million members. That’s about 1 in 20 NRA members.
According to the General Social Survey in 2013, about 1 in 4 Americans owned guns. That’s 79 million gun owners, and probably an undercount because gun owners are chronically suspicious of the intention behind such questions. But we’ll go with it as an assumption that’s best-case for the doorknockers.
That means that in order to stop attempted gun confiscations dead on a purely force-on-force level, only one in 317 America gun owners needs to remember that our first American Revolution began as spontaneous popular resistance to a gun-confiscation order. Only one in 317 American gun owners need to remember their duty under the U.S. Constitution as members of the unorganized militia – “the body of the people in arms”. Only one in 317 American gun owners need to shoot back.
Is that a plausible fraction? Yes. Yes, I think it is. Count me as one of them.
Why am I publishing these numbers? To persuade the would-be confiscators that their enterprise is doomed to fail in fire and blood, so freedom-loving people never actually have to take on the moral burden of killing them. The fact that we’re ready to do so if we have to does not mean we want that terrible day to arrive.
But eternal vigilance is not the only price of liberty. Eternal deterrence against would-be tyrants – including the threat and in extremis the use of revolutionary violence – is part of that price too. The Founding Fathers understood this. The question is whether a critical fraction of American gun owners today know our duty and would do it.
Here is why I am optimistic on that score: every estimate in this back-of-the envelope calculation has been pushed to the end of the plausible range that favors the confiscators. In fact, the stock of weapons that would need to be confiscated is much larger. The number of gun owners is pretty certainly underestimated. Even getting full compliance with confiscation orders from the agents and local police is unlikely, reducing the effective number of doorknockers.
Correspondingly, the critical fraction of American gun owners that would have to be hard-core enough to resist confiscation with lethal violence in order to stop the attempt is lower than 1 in 317. Probably much lower.
Especially if we responded by killing not merely the doorknockers but the bureaucrats and politicians who gave them their orders. Which would be more efficient, more just, and certain to follow.
May 30, 2018
Defect attractors
There’s a phrase I’ve used on this blog more than once that I had reason to Google just now and found that (to my surprise) the top hits are mostly my writings. It is “defect attractor”.
In this post I’m going to explain why I think this is an important concept that needs to be in the toolkit of every software engineer, and talk about the practice it implies.
The first thing to know about “defect attractor” is that I didn’t coin it myself. Google notices the first use I know about, by Les Hatton. In his work on the error statistics of large C++ programs, Hatton described class inheritance as a defect attractor.
I’m a fan of Hatton’s work. He has breadth of perspective. He asks interesting questions, finds sharp answers, and writes about them lucidly. I understood instantly what he meant by that phrase and seized upon it with glee.
A “defect attractor” of a program, language, API, or any other kind of software construct is a feature which, while possibly not bad in itself, spawns defects in the design or code near it.
The concept unifies a large class of things experienced software engineers know are problems. Portability shims are defect attractors. Special, corner, and edge cases are defect attractors – in fact, when we complain about something being a special case we are usually expressing unease about it being a defect attractor. “Special case” is the what of our complaint, defect attractor is the why.
Hatton was quite right, class inheritance is a defect attractor. This is true on the level of language design where it spawns questions like how to handle diamond inheritance that don’t have one good answer but multiple bad ones. It’s also true in OO codebases, where Hatton noticed that defects cluster noticeably around code using inheritance. Language designers have reacted sensibly by moving to trait- and interface-based object systems that don’t have inheritance – Go is a notable recent example.
More defect attractors: endianness-sensitive data representations, binary wire and file formats in general, and floating point. These are things where program after program using them makes the same dumb mistakes. Experience doesn’t help as much as it should.
In C: pointer arithmetic. Casts. Unions and type punning. And of course the Godzilla of defect attractors, manual management of dynamic memory allocation. Experienced programmers know these are going to bite them on the ass and that much of the labor of C programming is not the expression of algorithms but mitigation attempts to blunt the attractors’ teeth.
In any language, the goto statement is a famous defect attractor. So are text-substitution macros in languages that have anything resembling a preprocessing stage.
Once you’ve grasped what a defect attractor is, it’s a short step to good practice: stay the hell away from them! And when you see a known defect attractor in code you’re auditing, go to high alert.
Slowly we’ve learned this about some individual defect attractors like gotos. The consciousness I’m trying to raise here is that, as engineers, we should be more generally aware of what kinds of features and techniques defects cluster around; we should know to avoid them and to be suspicious when we can encounter them.
Clarity of language promotes clarity of thought. I want this phrase to spread because it’s clear. Thank you, Les Hatton.
I’m sure my commenters will have a good time pointing out defect-attractor classes I’ve missed. Just try to keep in mind that “things I don’t like” and “things that cause defect clusters” aren’t identical, eh?
May 28, 2018
The new science of Indo-European origins
I’ve had a strong amateur interest in historical linguistics since my teens in the early 1970s.
Then, as today, a lot of the energy in that field was focused in the origins and taxonomy of the Indo-European family – the one that includes English and the Latin-derived and Germanic languages and Greek and also a large group of languages in northern India and Persia. This is not only because most linguists are Europeans, it’s because there’s a massively larger volume of ancient literature in this family than can be found anywhere else in the world – there’s more to go on.
People have been trying to pin down the origin of the Indo-European language family and identify the people who spoke its root language for literally centuries. Speculations that turn out not to have been far wrong go back to the 1600s(!) and serious work on the problem, some of which is still considered relevant, began in the late 1700s.
However, until very recently theory about Indo-European origins really had to be classed as plausible guesses rather than anything one could call well-confirmed. There were actually several contending theories, because linguistic reconstruction of the root PIE (Proto-Indo-European) language was sort of floating in midair without solid enough connections to archaeological and genetic evidence to be grounded.
This has changed – dramatically – in the last five years. But there isn’t yet any one place you can go to read about all the lines of evidence yet; nobody has written that book as of mid-2018. This post is intended to point readers at a couple of sources for the new science, simply because I find it fascinating and I think my audience will too.
Why hasn’t the one big book of IE origins been written yet? Basically because the science needed to pull it together is paleogenetics – the study of fossil human DNA – but the linguists and the archaeologists and the paleogeneticists don’t talk to each other very well.
Until the end of the Cold War a lot of very relevant work done by archaeologists in Russia could not become available in English. Now, at least, we have one source that draws together the linguistics and that archaeology – The Horse, The Wheel, and Language by David Anthony (2010).
This is a really, really excellent book. You can read it free on-line. Among other virtues, it includes the best explanation for non-specialists I’ve ever seen of just how you go about reconstructing a language for which, like PIE, there are no written sources. The exhaustive parsing of the archeological evidence in the second half can be heavy going, but persevere; there are interesting insights shot all through it and rewards at the end.
But there’s a piece missing. Anthony knew nothing of paleogenetics, because that field was just barely getting off the ground when he was writing. But it turns out that comparative analysis of human fossil DNA (and the DNA of living humans, too!) can reveal a surprising amount about population movements and expansions before recorded history.
The paleogenetic record largely confirms the story Anthony extracts from his evidence. Where it doesn’t, well, that gets interesting too. The best discussion of this stuff I’ve found is on a blog called West Hunter by a brilliant and ornery population geneticist named Greg Cochran. He and a deceased partner wrote a really thought-provoking book, The Ten Thousand Year Explosion (2008) showing that (contrary to a popular assumption) human evolution didn’t stop with the rise of civilization but has actually sped up during the last 10,000 years.
In particular, after Anthony’s book you need to read Who We Are: #9 Europe, Cochran’s gloss on part of a book called Who We Are by David Reich that is a tour through the evidence from current paleogenetics. Reich’s book is almost certainly worth reading in itself (I haven’t yet) but for the PIE-origins question Cochran’s discussion of it is good enough.
Cochran is sometimes acidulously funny about Reich and Anthony and their critics, as one should well be when a significant barrier to understanding is various peoples’ political and ideological hobbyhorses. Cochran also has the great virtue that he corrects himself in public on the infrequent occasions he turns out to have been wrong. Quite separately from the PIE-origins thing, his extended review of Jared Diamond’s brilliant but flawed Guns, Germs, and Steel is worth seeking out.
OK, I’ve pointed you at the sources. Go read them. To whet your appetite, there follows a summary with some observations about how various people got the Indo-Europeans wrong. The history of this field is nearly as interesting, in some ways, as the question it examines.
What we can now say pretty much for sure: Proto-Indo-European was first spoken on the Pontic Steppes around 4000 BCE. That’s the grasslands north of the Black Sea and west of the Urals; today, it’s the Ukraine and parts of European Russia. The original PIE speakers (which we can now confidently identify with what archaeologists call the Yamnaya culture) were the first humans to domesticate horses.
And – well, basically, they were the first and most successful horse barbarians. They invaded Europe via the Danube Valley and contributed about half the genetic ancestry of modern Europeans – a bit more in the north, where they almost wiped out the indigenes; a bit less in the south where they mixed more with a population of farmers who had previously migrated in on foot from somewhere in Anatolia.
The broad outline isn’t a new idea. 400 years ago the very first speculations about a possible IE root language fingered the Scythians, Pontic-Steppe descendants in historical times of the original PIE speakers – with a similar horse-barbarian lifestyle. It was actually a remarkably good guess, considering. The first version of the “modern” steppe-origin hypothesis – warlike bronze-age PIE speakers domesticate the horse and overrun Europe at sword- and spear-point – goes back to 1926.
But since then various flavors of nationalist and nutty racial theorist have tried to relocate the PIE urheimat all over the map – usually in the nut’s home country. The Nazis wanted to believe it was somewhere in their Greater Germany, of course. There’s still a crew of fringe scientists trying to pin it to northern India, but the paleogenetic evidence craps all over that theory (as Cochran explains rather gleefully – he does enjoy calling bullshit on bullshit).
Then there have been the non-nutty proposals. There was a scientist named Colin Renfrew who for many years (quite respectably) pushed the theory that IE speakers walked into Europe from Anatolia along with farming technology, instead of storming in off the steppes in primitive war-wagons, brandishing weapons like some badass tribe in a Robert E. Howard novel.
Alas, Renfrew was wrong. It now looks like there was such a migration, but those people spoke a non-IE language (most likely something archaically Semitic) and got overrun by the PIE speakers a few thousand years later. Cochran calls these people “EEF” (Eastern European Farmers) and they’re most of the non-IE half of modern European ancestry. Basque is the only living language that survives from EEF times; Otzi the Iceman was EEF, and you can still find people with genes a lot like his in the remotest hills of Sardinia.
Even David Anthony, good as he is about much else, seems rather embarrassed and denialist about the fire-and-sword stuff. Late in his book he spins a lot of hopeful guff about IE speakers expanding up the Danube peacefully by recruiting the locals into their culture.
Um, nope. The genetic evidence is merciless (though, to be fair, Anthony can’t have known this). There’s a particular pattern of Y-chromosome and mitochondrial DNA variation that you only get in descendant population C when it’s a mix produced because aggressor population A killed most or all of population B’s men and took their women. Modern Europeans (C) have that pattern, the maternal line stuff (B) is EEF, and the paternal-line stuff (A) is straight outta steppe-land; the Yamnaya invaders were not gentle.
How un-gentle were they? Well…this paragraph is me filling in from some sources that aren’t Anthony or Cochran, now. While Europeans still have EEF genes, almost nothing of EEF culture survived in later Europe beyond the plants they domesticated, the names of some rivers, and (possibly) a murky substratum in some European mythologies.
The PIE speakers themselves seem to have formed, genetically, when an earlier population called the Ancient Northern Eurasians did a fire-and-sword number (entirely on foot, that time) on a group of early farmers from the Fertile Crescent. Cochrane sometimes calls the ANEs “Hyperboreans” or “Cimmerians”, which is pretty funny if you’ve read your Howard.
For the rest, go read the book and the blog. There’s lots more, including the remarkably detailed picture of IE culture (Anthony is at his best there) that you can get from indexing the reconstructed vocabulary against the archaeology.
Part of the surprise is how unsurprising it is. The PIE way of life is not strange to us; strong traces of it have transmitted through Greco-Roman, Norse, and Celtic mythology, flavoring our folklore and our fantasies and the oldest poetic epics of our languages. They truly were our cultural as well as our genetic ancestors.
They even looked like us – that is, like modern Europeans. We couldn’t actually know this until the new paleogenetic evidence came in. Yes, ancient historians had described the Pontic Greeks as light-skinned, even blond, which should have been a clue; but in the 20th century there was an understandable reaction against Nazi “Aryan” theorizing and everybody speculating about what early PIE-speakers looked like ran hard in the other direction.
This isn’t in Cochran or Anthony, either…but the pale and distinctly non-Asian complexion of the people who left the earliest Tarim Basin mummies around 1800 BCE isn’t a mystery any more. Their ancestors migrated east from the Pontic Steppes rather than west; they were Indo-Europeans, too, and looked it.
May 25, 2018
A touch in the night
Occasionally I have dreams that seem to be trying to assemble into the plot of an SF novel – weird and fractured as dreams are, but with something like a long-form narrative struggling to develop.
Occasionally I have nightmares. I don’t know how it goes for anybody else – and one reason I’m posting this is to collect anecdotal data in the comments – but if I wake up from a nightmare and then fall asleep shortly afterwards, it may grab hold of me again.
Yesterday morning I woke up about 5AM remembering one I’t just had. This is how it went, and how it ended…
It’s the near future. Driverless cars and AI-driven robots are ubiquitous. They’re not androids, though, they tend to have wheeled boxy chassis with manipulator arms like a carnival claw machine.
I’m a troubleshooter who’s been called into a scientific research campus to try to work out the reason for some odd anomalies in experimental data. My backstory is clear – I’m a hotshot with a reputation for cracking intractable problems. I’m not me, not a programmer, but something more like a forensic physicist. There’s a lot of money behind whatever research was going on, government or military money, maybe black budget, and the people who have sent me in want answers.
The research team is being surprisingly open and un-defensive, considering that from their point of view I could be a hatchet-man sent to kick butt and take names. I’m not. I’m really a problem-solver. They seem to get this, for which I am thankful.
But I can’t get any clarity about exactly what the data anomalies were. The team scientists and I can understand each other perfectly well most of the time, everybody is speaking English, but when they try to explain their problem they speak words I hear but can’t parse – meaningless blobs of sound. It’s not like a vocabulary failure, but like either they or I have developed some weirdly topic-specific aphasia.
The research has something to do with war robots. There’s a vivid scene where scaled-down models of weapon-equipped bots are fighting each other. One, looking rather like a skinny tall dalek, fires a blue-green laser. It shouldn’t be visible, there isn’t enough dust or water vapor in the air, but it is. It leaves a small scorch-mark on a wall.
The war robots aren’t behaving quite right. Actually, that’s happening all over the labs. Utility robots are malfing – going catatonic, or falling into logic loops that cause perseveration at useless behaviors. I discover a trick to snap them out of whatever cybernetic funk they’re in, but it only works temporarily. They fall back into fugue at random times.
Whatever derangement affects the robots is spreading. I’m getting very uneasy; this is beginning to feel like enemy action and I still can’t get any sense out of the scientists about the problem I came here to solve.
In the dead of a very dark night, a small group of us go out to examine some glitching sensor equipment near the campus perimeter. Large groups of crickets are chirping…and behind that sound there’s something else. A soft mechanical clattering that stops dead whenever the crickets go silent.
Something is hiding out there in the night. Waiting to touch us.
That’s when I woke up. Remembered the dream. Realized it had been building up to some kind of really dark SF/horror scenario. Hostile aliens? Malevolent AI? I don’t know.
What I do know is that if I fall asleep again too soon I’m likely to find out. This happens sometimes when I wake up from any kind of dream – I can feel it waiting below the edge of consciousness waiting to pull me back in. If the dream was pleasant I may welcome dripping back into it, but not this time. I don’t like horror movies and want no part of this one.
I contemplate going to my office and working for a bit. A bit of hacking and an early breakfast will, I am sure, change my state enough that I can sleep again without falling into the same dream. I’ve applied this fix before.
That’s when I find out my wife Cathy is awake, behind my back as I’m lying there on my left side, because she puts her hand on my side and caresses it gently. And oh what a difference that makes. The nightmare blows away like tatters of fog.
Sometimes a touch in the night is a wonderful thing. I am left wondering, not for the first time…did that dream want to be a novel?
May 22, 2018
Review: The Fractal Man
The Fractal Man (written by J.Neil Schulman, soon to be available on Amazon) is a very, very funny book – if you share enough subcultural history with the author to get the in-jokes.
If you don’t – and in particular if you never met Samuel Edward Konkin – the man known as known as “SEKIII” to a generation of libertarians and SF fans before his tragically early death in 2004 – it will still be a whirligig of a cross-timeline edisonade, but some bits might leave you wondering how the author invented such improbabilities. But I knew SEKIII, and if there was ever a man who could make light of having a 50KT nuclear warhead stashed for safekeeping in his apartment, it was him.
David Albaugh is a pretty good violinist, a science-fiction fan, and an anarchist with a bunch of odd and interesting associates. None of this prepares him to receive a matter-of-fact phone call from Simon Albert Konrad III, a close friend who he remembers as having been dead for the previous nine years.
His day only gets weirder from there, as SAKIII and he (stout SF fans that they are) deduce that David has somehow been asported to a timeline not his own. But what became of the “local” Albaugh? Before the two have time to ruminate on that, they are both timeshifted to a history in which human beings (including them) can casually levitate, but there is no music.
Before they can quite recover from that, they’ve been recruited into a war between two cross-time conspiracies during which they meet multiples of their own fractals – alternate versions of themselves, so named because there are hints that the cosmos itself has undergone a kind of shattering that may have been recent in what passes for time (an accident at the Large Hadron Collider might have been involved). One of Albaugh’s fractals is J. Neil Schulman.
It speeds up to a dizzying pace; scenes of war, espionage, time manipulations, and a kiss-me/kill-me romance between Albaugh and an enemy agent (who also happens to be Ayn Rand’s granddaughter), all wired into several just-when-you-thought-it-couldn’t-go-further-over-the-top plot inversions.
I don’t know that the natural audience for this book is large, exactly, but if you’re in it, you will enjoy it a lot. Schulman plays fair; even the weirdest puzzles have explanations and all the balls are kept deftly in the air until the conclusion.
Assuming you know what “space opera” is, this is “timeline opera” done with the exuberance of a Doc Smith novel. Don’t be too surprised if some of it sails over your head; I’m not sure I caught all the references. Lots of stuff blows up satisfactorily – though, not, as it happens, that living-room nuke.
May 13, 2018
I saw Brand X live a few hours ago
I saw Brand X live a few hours ago. Great fusion-jazz band from the 1970s, still playing like genius maniacs after all these years. Dropped $200 on tickets, dinner for me and my wife, and a Brand X cap. Worth. Every. Penny.
Yeah, they saved “Nuclear Burn” for the encore…only during the last regular number some idiot managed to spill water on Goodsall’s guitar pedals, making it unsafe for him to play. So that blistering guitar line that hypnotized me as a college student in 1976 had to be played by their current keyboardist, Scott Weinberger. And damned if he didn’t pull it off!
Brilliant performance all round. Lots of favorites, some new music including a track called “Violent But Fair” in which this band of arcane jazzmen demonstrated conclusively that they can out-metal any headbanger band on the planet when they have a mind to. And, of all things, a whimsical cover of Booker T and the MGs’ “Green Onions”.
The audience loved the band – nobody was there by accident, it was a houseful of serious prog and fusion fans like me. The band loved us right back, cracking jokes and goofing on stage and doing a meet-and-greet after the show.
I got to shake Goodsall’s hand and tell him that he had rocked my world when I first heard him play in ’76 and that it pleases me beyond all measure that 40 years later he’s still got it. You should have seen his smile.
Gonna wear that Brand X cap next time I go to the pistol range and watch for double-takes.
May 12, 2018
Draining the manual-page swamp
One of my long-term projects is cleaning up the Unix manual-page corpus so it will render nicely in HTML.
The world is divided into two kinds of people. One kind hears that, just nods and says “That’s nice,” having no idea what it entails. The other kind sputters coffee onto his or her monitor and says something semantically equivalent to “How the holy jumping fsck do you think you’re ever going to pull that off?”
The second kind has a clue. The Unix man page corpus is scattered across tens of thousands of software projects. It’s written in a markup – troff plus man macros – that is a tag soup notoriously resistent to parsing. The markup is underspecified and poorly documented, so people come up with astoundingly perverse ways of abusing it that just happen to work because of quirks in the major implementation but confuse the crap out of analysis tools. And the markup is quite presentation oriented; much of it is visual rather than structural and thus difficult to translate well to the web – where you don’t even know the “paper” size of your reader’s viewer, let alone what fonts and graphics capabilities it has.
Nevertheless, I’ve been working this problem for seventeen years and believe I’m closing in on success in, maybe, another five or so. In the rest of this post I’ll describe what I’m doing and why, so I have an explanation to point to and don’t have to repeat it.
First, how we got where we are. Unix documentation predates even video terminals. When the first manual page was written in the very early 1970s, the way you displayed stuff was to print it – either on your teletype or – slightly later – a phototypesetter.
Crucially, while the photypesetter could do fairly high-quality typesetting with multiple fonts and kerning, the teletype was limited to a single fixed-width font. Thus, from nearly the beginning, the Unix documentation toolchain was adapted to two different output modes, one assuming only very limited capability from its output device.
At the center of the “Unix documentation toolchain” was troff (for phototypesetters) and its close variant nroff (for ttys). Both interpreted a common typesetting language. The language is very low-level and visually oriented, with commands like “insert line break” and “change to specified font”. Its distinguishing feature is that (most) troff requests are control words starting with a dot at the beginning of a line; thus, thus, “insert line break” is “.br”. But some requests were “escapes” begun with a backslash and placed inline; thus, “\fI” means change to italic font”.
Manual pages were never written directly in troff. Instead, they were (and are) written mostly in macros expanded to sequences of troff requests by a preprocessor. Insteas of being purely visual, many of these macros are structural; they say things like “start new pagraph” or “item in bulleted list”. I say “mostly” because manual pages still contain low-level requests like font changes.
Text-substitution macro languages have become notorious for encouraging all manner of ingenious but ugly and hard-to-understand hackery. The troff language helped them get that reputation. Users could define their own macros, and sometimes did. The design encouraged visual microtweaking of pages to get the appearance just right – provided you know things like your paper size and the font capabilities of your output device exactly. In the hands of an expert troff could produce spare, elegant typesetting that still looks good decades later.
By 1980 there was already a large corpus – thousands, at least – of manual pages written in troff markup. The way it was rendered was changing, however.
First, ttys were displaced by tube terminals – this was in the late 1970s, around the time I started programming. nroff was quickly adapted to produce output for these, which is why we still use the “man” command in terminal emulators today. That’s nroff behind it turning man-page markup into fixed-width characters on your screen.
Not long after that that people almost completely stopped printing manual pages. The payoff from cute troff tricks declined because tube terminals were such a limited rendering device. This encouraged a change in the way people wrote them – simpler, with less purely visual markup, more structural. Today there’s a noticeable gradient in markup complexity by age of the page – newer ones tend to be simpler and you almost never see the really old-school style of elaborate troff tricks outside of the documentation of GNU troff itself.
Second, in the early 1980s, laser printers and Postscript happened. Unix man pages themselves changed very little in response because nroff-to-terminals had already become so important, but the entire rest of the range of troff’s use cases simplified to “generate Postscript” over the next decade. Occasionally people still ask it to emit HP’s printer language; that’s about the only exception left. The other back-end typesetting languages troff used to emit are all dead.
But the really big disruption was the World Wide Web.
By about 1997 it was becoming obvious that in the future most documentation would move to the web; the advantages of the hyperlink were just too obvious to ignore. The new wave in documentation markup languages, typified by DocBook, was designed for a Web-centric world in which – as with nroff on terminals – your markup can’t safely make a lot of assumptions about display size or fonts.
To deal with this, the new document markup languages were completely structural. But this created a huge problem. How were we going to get the huge pile of grubby, visually-marked-up Unix man pages into purely structural markup?
Yes, you can translate a straight visual markup into a sort of pidgin HTML. That’s what tools like man2html and troff2html do. But this produces poor, ugly HTML that doesn’t exploit the medium well. One major thing you lose is tables. The man pages of these tools are full of caveats and limitations. Basically, they suck.
Trying to jawbone every project maintainer in the world into moving their masters to something else web-friendly by hand seemed doomed. What we really needed was mechanical translation from structural man macros (including table markup) to a structural markup.
When I started thinking about this problem just after Y2K, the general view among experts was that it was impossible, or at least unfeasibly hard barring strong AI. Trying to turn all that messy, frequently malformed visual tag soup into clean structure seemed like a job only a human could handle, involving recognition of high-level patterns and a lot of subtle domain and context knowledge.
Ah, but then there was (in his best Miss Piggy voice) moi.
I have a background in AI and compiler technology. I’m used to the idea that pattern-recognition problems that seem intractable can often be reduced to large collections of chained recognition and production rules. I’ve forgotten more about writing parsers for messy input languages than most programmers ever learn. And I’m not afraid of large problems.
The path forward I chose was to lift manual pages to DocBook-XML, a well-established markup used for long-form technical manuals. “Why that way?” is a reasonable question. The answer is something a few experiments showed me: the indirect path – man markup to DocBook to HTML – produces much better-quality HTML than the rather weak direct-conversion tools.
But lifting to DocBookXML is a hard problem, because the markup used in man pages has a number of unfortunate properties even beyound those I’ve already mentioned. One is that the native parser for it doesn’t, in general, throw errors on ill-formed or invalid markup. Usually such problems are simply ignored. Sometimes they aren’t but produce defects that are hard for a human reader scanning quickly to notice.
The result is that manual pages often have hidden cruft in them. That is, they may render OK but they do so essentially by accident. Markup malformations that would throw errors in a stricter parser pass unnoticed.
This kind of cruft accumulates as man pages are modified and expanded, like deleterious mutations in a genome. The people who modify them are seldom experts in roff markup; what they tend to do is monkey-copy the usage they see in place, including the mistakes. Thus defect counts tend to be proportional to age and size, with the largest and oldest pages being the cruftiest.
This becomes a real problem when you’re trying to translate the markup to something like DocBook-XML. It’s not enough be able to lift clean markup that makes structural sense; you have to deal with the accumulated cruft too.
Another big one, of course, is that (as previously noted) roff markup is presentational rather than semantic. Thus, for example, command names are often marked by a font change, but there’s no uniformity about whether the change is to italic, bold, or fixed width.
XML-DocBook wants to do structured tagging based on the intended semantics of text. If you’re starting from presentation markup, you have to back out the intended semantics based on a combination of cliche recognition and context rules. My favorite tutorial example is: string marked by a font change and containing “/” is wrapped by a DocBook filename tag if the name of the enclosing section is “FILES”.
But different people chose different cliches. Sometimes you get the same cliche used for different semantic purpose by different authors. Sometimes multiple cliches could pattern-match to the same section of text.
A really nasty problem is that roff markup is not consistent (he said, understating wildly) about whether or not its constructions have end-of-scope markers. Sometimes it does – the .RS/.RE macro pair for changing relative indent. More often, as for example in font changes, it doesn’t. It’s common to see markup like “first we’re in \fBbold,\fIthen italic\fR.”
Again, this is a serious difficulty when you’re trying to lift to a more structured XML-based markup with scope enders for everything. Figuring out where the scope ends should go in your translation is far from trivial even for perfectly clean markup.
Now think about all the other problems interact with the cruft. Random undetected cruft can be lying in wait to confuse your cliche recognition and trip up your scope analyzer. In truth, until you start feeling nauseous or terrified you have not grasped the depth of the problem.
The way you tackle this kind of thing is: Bite off a piece you understand by writing a transformation rule for it. Look at the residuals for another pattern that could be antecedent to another transformation. Lather, rinse, repeat. Accept that as the residuals get smaller, they get more irregular and harder to process. You won’t get to perfection, but if you can get to 95% you may be able to declare victory.
A major challenge is keeping the code structure from becoming just as grubby as the pile of transformation rules – because if you let that slide it will become an unmaintainable horror. To achive that, you have to be constantly be looking for opportunities to generalize and make your engine table-driven rather than writing a lot of ad-hoc logic.
It took me a year of effort to get to doclifter 1.0. It could do a clean lift on 95% of the 5548 man pages in a full Red Hat 7.3 workstation install to DocBook. (That’s a bit less than half the volume of the man pages on a stock Ubuntu installation in 2018.) The reaction of topic experts at the time was rather incredulous. People who understood the problem had trouble believing doclifter actually worked, and no blame for that – I’m good, but it was not a given that the problem was actually tractable. In truth even I was a little surprised at getting that good a coverage rate without hitting a wall.
Those of you a bit familiar with natural-language processing will be unsurprised to learn that at every iteration 20% of the remaining it-no-work pages gave me 80% of the problems, or that progress slowed in an inverse-geometric way as I got closer to 1.0.
In retrospect I was helped by the great simplification in man markup style that began when tube terminals made nroff the renderer for 99% of all man page views. In effect, this pre-adapted man page markup for the web, tending to select out the most complex and intractable troff features in favor of simple structure that would actually render on a tube terminal.
Just because I could, I also taught doclifter to handle the whole rest of the range of troff markups – ms, mm, me and so forth. This wasn’t actually very difficult once I had the framework code for man processing. I have no real idea how much this capability has actually been used.
With doclifter production-ready I had the tool required to drain the swamp. But that didn’t mean I was done. Oh no. That was the easy part. To get to the point where Linux and *BSD distributions could flip a switch and expect to webify everything I knew I’d have to push the failure rate of automated translation another factor of five lower, to the point where the volume of exceptions could be reasonably handled by humans on tight deadlines.
There were two paths forward to doing that. One was to jawbone project maintainers into moving to new-school, web-friendly master formats like DocBook and asciidoc. Which I did; as a result, the percentage of man pages written that way has gone from about 2% to about 6%.
But I knew most projects wouldn’t move, or wouldn’t move quickly. The alternative was to prod that remnant 5%, one by one, into fixing their crappy markup. Which I have now been doing for fifteen years, since 2003.
Every year or two I do a sweep through every manual page in sight of me, which means everything on a stock install of the currently dominant Linux distro, plus a boatload of additional pages for development tools and other things I use. I run doclifter on every single one, make patches to fix broken or otherwise untranslatable markup, and mail them off to maintainers. You can look at my current patch set
and notes here.
I’ve had 579 patches accepted so far, so I am getting cooperation. But the cycle time is slow; there wouldn’t be much point in sampling the corpus faster than the refresh interval of my Linux distribution, which is about six months.
In a typical round, about 80 patches from my previous round have landed and I have to write maybe two dozen new ones. Once I’ve fixed a page it mostly stays fixed. The most common exception to that is people modifying command-option syntax and forgetting to close a “]” group; I catch a lot of those. Botched font changes are also common; it’s easy to write one of those \-escapes incorrectly and not notice it.
There are a few categories of error that, at this point, cause me the most problems. A big one is botched syntax in descriptions of command-line options, the simplest of which is unbalanced [ or ] in option groups. But there are other things that can go wrong; there are people, for example, who don’t know that you’re supposed to wrap mandatory switches and arguments in { } and use something else instead, often plain parentheses. It doesn’t help that there is no formal standard for this syntax, just tradition – but some tools will break if you flout it.
A related one is that some people intersperse explanatory text sections in their command synopses, or follow a command synopsis with a summary paragraph. The proper boundary to such trailing paragraphs is fiendishly difficult to parse because distinguishing fragments of natural language from command syntax is hard, and DocBook markup can’t express the interspersed text at all. This is one of the very few cases in which I have to impose a usage restriction in order to get pages to lift. If you maintain a manual page, don’t do these these things!. If doclifter tells you “warning – dubious content in Synopsis”, please fix until it doesn’t.
Another bane of my life has been botched list syntax, especially from misuse of the .TP macro. This used to be very common, but I’ve almost succeeded in stamping it out; only around 20 instances turned up in my latest pass. The worst defects come from writing a “bodiless .TP”, an instance with a tag but no following text before another .TP or a section macro. This is the most common cause of pages that lift to ill-formed XML, and it can’t be fixed by trickier parsing. Believe me, I’ve tried…
Another big category of problems is people using very low-level troff requests that can’t be parsed into structure, like .in and .ce and especially .ti requests. And yet another is abuse of the .SS macro to bolden and outdent text that isn’t really a section heading.
But over time I have been actually been succeeding in banishing a lot of this crap. Counting pages that have moved to web-friendly master formats, the percentage of man-page content that can go automatically to really high-quality HTML (with tables and figures and formulas properly carried along) is now over 99%.
And yes, I do think what I see in a mainsteam Linux distro is a sufficiently large and representative sample for me to say that with confidence. Because I notice that my remaining 75 or so of awkward cases are now heavily concentrated around a handful of crotchety GNU projects; groff itself being a big one.
I’ll probably never get it perfect. Some fraction of manual pages will always be malformed enough to terminally confuse my parser. Strengthening doclifter enough to not barf on more of them follows a pattern I’ve called a “Zeno tarpit – geometrically increasing effort for geometrically decreasing returns.
Even if I could bulletproof the parser, perfection on the output side is hard to even define. It depends on your metric of quality and how many different rendering targets you really want to support. There are three major possibles: HTML, PostScript, and plain text. DocBook can render to any of them.
There’s an inescapable tradeoff where if you optimize for one rendering target you degrade rendered quality for the others. Man-page markup is an amphibian – part structural, part visual. If you use it visually and tune it carefully you will indeed get the best output possible on any individual rendering target, but the others will look pretty terrible.
This is not a new problem. You could always get especially pretty typography in man if you decide you care most about the Postcript target and use troff as a purely visual markup, because that’s what it was designed for. But you take a serious hit in quality of generated HTML because in order to do *that* right you need *structural* markup and a good stylesheet. You take a lesser hit in plain-text rendering, especially near figures and tables.
On the other hand, if you avoid purely visual markup (like .br, .ce, \h and \v), emphasizing structure tags motions, you can make a roff master that will render good HTML, good plain text, and something acceptable if mediocre on Postscript/PDF. But you get the best results not by naive translation but by running a cliche recognizer on the markup and lifting it to structure, then rendering that whatever you want via stylesheets. That’s what doclifter does.
Lifting to DocBook makes sense under the assumption that you want to optimize the output to HTML. This pessimizes the Postscript output, but the situation is not quite symmetrical among the three major targets; what’s good for HTML tends to be coupled to what’s good for plain text. My big cleanup of the man page corpus, now more than 97% complete after seventeen years of plugging at it, is based on the assumption that Postscript is no longer an important target, because who prints manual pages anymore?
Thus, the changes I ship upstream try to move everyone towards structural markup, which is (a) less fragile, (b) less likely to break on alien viewers, and (c) renders better to HTML.
Eric S. Raymond's Blog
- Eric S. Raymond's profile
- 140 followers
