Eric S. Raymond's Blog, page 8

December 24, 2018

Pessimism about parallelism

Massive concurrency and hardware parallelism are sexy topics in the 21st century. There are a couple of good reasons for this and one rather unfortunate one.


Two good reasons are the combination of eye-catching uses of Graphics Processing Units (GPUs) in games and their unexpected secondary uses in deep-learning AI – these exploit massive hardware parallelism internally. The unfortunate reason is that single-processor execution speeds hit a physics wall in about 2006. Current leakage and thermal runaway issues now sharply limit increases in clock frequency, and the classic way out of that bind – lowering voltage – is now bumping up against serious quantum-noise issues.


Hardware manufacturers competing for attention have elected to do it by putting ever more processing cores in each chip they ship and touting the theoretical total throughput of the device. But there have also been rapidly increasing amounts of effort put into pipelining and speculative execution techniques that use concurrency under the hood in attempts to make the serial single processors that programmers can see crank instructions more rapidly.


The awkward truth is that many of our less glamorous computing job loads just can’t use visible concurrency very well. There are different reasons for this that have differing consequences for the working programmer, and a lot of confusion abroad among those reasons. In this episode I’m going to draw some distinctions that I hope will help all of us think more clearly.


First, we need to be clear about where harnessing hardware parallelism is easy and why that seems to be the case. We look at computing for graphics, neural nets, signal processing, and Bitcoin mining, and we see a pattern: parallelizing algorithms work best on hardware that is (a) specifically designed to execute them, and (b) can’t do anything else!


We also see that the inputs to the most successful parallel algorithms (sorting, string matching, fast-Fourier transform, matrix operations, image reverse quantization, and the like) all look rather alike. They tend to have a metric structure and an implied distinction between “near” and “far” in the data that allows it to be carved into patches such that coupling between elements far from each other is negligible.


In the terms of an earlier post on semantic locality, parallel methods seem to be applicable mainly when the data has good locality. And they run best on hardware which – like like the systolic-array processors at the heart of GPUs – is designed to support only “near” communication, between close-by elements.


By contrast, writing software that does effective divide-and-conquer for input with bad locality on a collection of general-purpose (Von Neumann architecture) computers is notoriously difficult.


We can sum this up with a heuristic: Your odds of being able to apply parallel-computing techniques to a problem are inversely proportional to the degree of irreducible semantic nonlocality in your input data.


Another limit on parallel computing is that some important algorithms can’t be parallelized at all – provably so. In the blog post where I first explored this territory I coined the term “SICK algorithm”, with the SICK expanded to “Serial, Intrinscally – Cope, Kiddo!” Important examples include but are not limited to: Dijkstra’s n-least-paths algorithm; cycle detection in directed graphs (with implications for 3-SAT solvers); depth first search; computing the nth term in a cryptographic hash chain; network-flow optimization.


Bad locality in the input data is implicated here, too, especially in graph- and tree-structure contexts. Cryptographic hash chains can’t be parallelized because their entries have to be computed in strict time order – a strictness which is actually important for validating the chain against tampering.


There’s a blocking rule here: You can’t parallelize if a SICK algorithm is in the way.


We’re not done. There are at least two other classes of blocker that you will frequently hit.


One is not having the right tools. Most languages don’t support anything but mutex-and-mailbox, which has the advantage that the primitives are easy to implement but the disadvantage that it induces horrible complexity explosions and is nigh-impossible to model accurately in your head at scales over about four interacting locks.


If you are lucky you may get some use out of a more tractable primitive set like Go channels (aka Communicating Sequential Processes) or the ownership/send/sync system in Rust. But the truth is, we don’t really know what the “right” language primitives are for parallelism on von-Neuman-architecture computers. And there may not even be one right set of primitives; there might be two, three, or more different sets of primitive appropriate for different problem domains but as incommensurable as one and the square root of two. At the present state of the art in 2018 nobody actually knows.


Last but not least, the limitations of human wetware. Even given a tractable algorithm, a data representation with good locality, and sharp tools, parallel programming seems to be just plain difficult for human beings even when algorithm being applied is quite simple. Our brains are not all that good at modelling the simpler state spaces of purely serial programs, and much less so at parallel ones.


We know this because there is plenty of real-world evidence that debugging implementations of parallelizing code is worse than merely _difficult_ for humans. Race conditions, deadlocks, livelocks, and insidious data corruption due to subtly unsafe orders of operation plague all such attempts.


Having a grasp on these limits has, I think, has been growing steadily more important since the collapse of Dennard scaling. Due to all of these bottlenecks in the supply of code that can use multiple cores effectively, some percentage of the multicore hardware out there must be running software that will never saturate its cores; or, to look at it from the other end, the hardware is overbuilt for its job load. How much money and effort are we wasting this way?


Processor vendors would love you to overestimate the functional gain from snazzy new silicon with ever larger multi-core counts; however else will they extract enough of your money to cover the eye-watering cost of their chip fabs and still make a profit? So there’s a lot of marketing push out there that aims to distract capacity planners from ever wondering when those gains are real.


And, to be fair, some places they are. The kind of servers that live in rack mounts and handle hundreds of thousands of concurrent transactions per second probably have their core count matched to their job load fairly well. Smartphones or embedded systems, too – in both these extreme cases a lot of effort goes into minimizing build costs and power budgets, and that’s going to exert selective pressure against overprovisioning.


But for typical desktop and laptop users? I have dark suspicions. It’s hard to know, because we’ve been collecting real performance gains due to other technology changes like the shift from spinning-rust to solid-state mass storage. Gains like that are easy to mistake for an effect of more CPU throughput unless you’re profiling carefully.


But here’s the shape of my suspicion:


1. For most desktop/laptop users the only seriously parallel computing that ever takes place on their computers is in their graphics chips.


2. More than two processor cores is usually just wasteful hotrodding. Operating systems may be able to parcel out applications between them, but the general run of application software is unable to exploit parallelism and it is rare for most users to run enough different processor-hungry applications simultaneously to saturate their hardware that way.


3. Consequently, most of the processing units now deployed in 4-core-and-up machines are doing nothing most of the time but generating waste heat.


My regulars include a lot of people who are likely to be able to comment intelligently on this suspicion. It will be interesting to see what they have to say.


UPDATE: A commenter on G+ points out that one interesting use case for multicores is compiling code really quickly. Source for a language like C has good locality – it can be compiled in well-separated units (source files) into object files that are later joined by a linker.

 •  0 comments  •  flag
Share on Twitter
Published on December 24, 2018 03:15

December 16, 2018

The blues about the blues

Some kinds of music travel well – they propagate out of their native cultures very readily. American rock music and European classical music are obvious examples; they have huge followings and expert practitioners pretty much everywhere on earth that’s in contact with civilization.


Some…don’t travel well at all. Attempts to imitate them by people who aren’t native to their home culture seldom succeed – they fall afoul of subtleties that a home-country connoisseur can hear but not explain well, or at all. The attempts may be earnestly polished and well meant, but in some ineffable way they lack soul. American blues music and to a lesser but significant extent jazz are like this, which is all the more interesting because they’re close historical and genetic kin to rock.


Why am I thinking about this? Because one of the things that YouTube’s recommender algorithms make easy (and almost inevitable) is listening to strings of musical pieces that fit within what the algorithms recognize as a genre. I’ve noticed that the places where its genre recognition is most likely to break down are correlated with whether the genre travels well. So whatever I’m noticing about that distinction is not just difficult for humans but for machine learning as well, at least at current state of the art.


Most attempts at blues by non-Americans are laughable – unintentional parodies by people trying for the real thing. Not all; there was an older generation of British and Irish musicians who immersed in the form in the early Sixties and grokked it well enough to bring it back to the U.S., completely transforming American rock in the process. There are, for some reason, a small handful of decent blues players in Holland. But elsewhere, generative understanding of the heart of the blues is so rare that I was utterly gobsmacked when I found it in Greece.


I don’t know for sure, not being a home-country connoisseur, but I strongly suspect that Portuguese fado is like this. I have a pretty good ear and readily synchronize myself to different musical styles; I can even handle exotica like Indian microtones decently. But I wouldn’t go near fado, I sense a grave risk that if I tried any actual Portuguese fado fan would be politely suppressing a head-shaking he-really-don’t-get-it reaction the same way I usually have to when I listen to Eurojazz.


And Eurojazz players have a better frequency of not ludicrously failing than Euro blues players! Why? I don’t know. I can only guess that the recognition features of “real” jazz are less subtle than for “real” blues, and imitators are thus less likely to slide into unintentional parody. But since I can’t enumerate those recognition features this remains a guess. I do know timing is part of it, and there are uses of silence that are important. Eurojazz tends to be too busy, too slick.


If it’s any consolation to my non-American readers, Americans don’t automatically get it either. My own beloved wife, despite being musically talented, doesn’t have the ear – blues doesn’t speak to her, and if she were unwise enough to try to imitate it she would doubtless fail badly.


One reason I’m posting this is that I hope my commenters might be able to identify other musical genres that travel very poorly – I want to look for patterns. Are there foreign genres that Americans try to imitate and don’t know they’re botching?


And now a different kind of blues about the blues…


There’s an unacknowledged and rather painful truth about the blues, which is that that the primitive Delta versions blues fans are expected to revere are in many ways not as interesting as what came later, out of Chicago in particular. Monotonous, repetitive lyrics, primitive arrangements…but there’s a taboo against noticing this so strong that it took me over forty years to even notice it was there, and I might still not have if I hadn’t spent two days immersed in the rootsiest examples I could find on YouTube.


I found that roots blues is surrounded by a haze of retrospective glorification that (to my own shock!) it too often fails to deserve. And of course the obvious question is “Why?”. I think I’ve figured it out, and the answer is deeply sad.


It’s because, if you notice that later, more evolved and syncretized versions of the blues tend to be more interesting, and you say so, you risk making comparisons that will be interpreted as “white people do it better than its black originators”. And nobody wants that risk.


This came to me as I was listening to a collection of blues solos by Gary Moore, a now-deceased Irishman who played blues with both real heart and a pyrotechnic brilliance you won’t find in Robert Johnson or (one of my own roots favorites) John Lee Hooker. And found myself flinching from the comparison; took me an act of will to name those names just now, even after I’d been steeling myself to it.


Of course this is not a white > black thing; it’s an early vs. late thing. Recent blues players (more likely to be white) have the history of the genre itself to draw on. They have better instruments – Gary Moore’s playing wouldn’t be possible without Gary Moore’s instrument, you can get more tone colors and dynamic range out of a modern electric guitar than you could out of a wooden flattop with no pickups. Gary Moore grew up listening to a range of musical styles not accessible to an illiterate black sharecropper in 1930 and that enriched his playing.


But white blues players may be at an unfair disadvantage in the reputational sweepstakes forever simply because nobody wants to takes the blues away from black people. That would be a particularly cruel and wrong thing to do given that the blues originated as a black response to poverty and oppression largely (though not entirely) perpetrated by white people.


Yes, the blues belongs to all of us now – it’s become not just black roots music but American roots music; I’ve jammed onstage with black bluesmen and nobody thought that was odd. Still, the shadow of race distorts our perceptions of it, and perhaps always will.

 •  0 comments  •  flag
Share on Twitter
Published on December 16, 2018 04:35

December 4, 2018

The curious case of the missing accents

I have long been a fan of Mark Twain. One of the characteristics of his writing us the use of “eye dialect” – spellings and punctuation intended to phoneticize the speech of his characters. Many years ago I noticed a curious thing about Twain’s eye dialect – that is, he rendered few or no speech differences between Northern and Southern characters. His Northerners all sounded a bit Southern by modern standards, and his Southerners didn’t sound very Southern.


The most obvious possible reason for this could have been that Twain, born and raised in Missouri before the Civil War, projected his own border-state dialect on all his characters. Against this theory I could set the observation that Twain was otherwise a meticulously careful writer with an excellent ear for language, making that an unlikely sort of mistake for him. My verdict was: insufficient data. And I didn’t think the question would ever be resolvable, Twain having died when sound recording was in its infancy.


Then I stumbled over some fascinating recordings of Civil War veterans on YouTube. There’s Confederate “General” Julius Howell Recalls the 1860s from 1947. And 1928-1934: Recollections of the US Civil War. And here’s what jumped out at me…



…those veterans, Northern and Southern, both spoke as if they might be pronouncing Twain’s relatively uniform eye-dialect! Some Northern-Southern split was present, certainly, but it was very subtle compared to the regional differences one would expect today – I can spot it myself, but I think many moderns would find it imperceptible.


Mark Twain’s ear is vindicated. Fascinating…and this came together with a video I’d watched a couple of years ago on the survival of Appalachian English. It may have been this one. I remember thinking that the eye dialect I’d read in old novels (not particularly Twain’s, though including his) suggested that speech features we would now mark “Appalachian” – or, somewhat disparagingly, “hillbilly talk” – used to be widely distributed not just in the rural South but in the rural North as well.


The second thing I noticed about the recorded speech of Civil War vets, after the absence of really pronounced North-South differences, is that it seemed to me to retain more phonetic features we would now think of as Appalachian than either modern Northern or Southern dialects do. I couldn’t quite pin it down, because I’m only an amateur phonologist, not a really trained one – but there was something about the vowels…


The third thing I noticed was pretty funny. I’m listening to these Civil war vets talk and something tickles my awareness. I think “I’ve heard their accent before. Where have I heard their accent before?” It took me a few minutes, but I finally twigged: it’s the “old-timer” accent from movie Westerns of the 1930s and 1940s! Not the more modern Northern and Southern accents of the audience-identification characters, but the speech of the grizzled old prospectors and mule-skinners and other bit characters supposed to be from a previous generation.


That is, Hollywood actors of that time portraying men who would have been in their fifties through seventies during the cinematic “Old West” era (1865-1895 or so), gave them the speech pattern I was recognizing in Civil War veterans who survived long enough for the actors to use them as models of pre-Civil-War dialect. It’s a nicety that probably would not have been lost on the movies’ first-run audiences.


More and more interesting. Now, if you look up the history of the Southern accent you’ll find that dialectologists know quite well that the Southern accent we know today is a post-Civil-War development. Wikipedia: “Older Southern American English was a set of American English dialects of the Southern United States, primarily spoken by White Southerners up until the American Civil War, moving towards a state of decline by the turn of the nineteenth century, further accelerated after World War II and again, finally, by the Civil Rights Movement. These dialects have since largely given way, on a larger regional level, to a more unified and younger Southern American English, notably recognized today by a unique vowel shift and certain other vocabulary and accent characteristics.”


What doesn’t seem to have made the standard account is that the differences between Old Southern and Old Northern used to be quite a bit less obvious – in fact I think there might be room for doubt that those actually constituted high-level groupings at all before the Civil War. It might be there were just a lot of smaller local dialect clades, mostly much less divergent from the border-state accent of Mark Twain (or, for a more modern example, Johnny Cash) than corresponding regional accents are today.


That’s certainly what the sound recordings are telling me. It’s what I thought I originally saw in Twain’s eye dialect. And in the broad sweep of history it wouldn’t be surprising. Linguistic uniformity over large areas has normally only happened as a result of recent invasion and conquest; settled humans rapidly develop geographically fine-grained and increasingly sharp dialect differences.


Or did, anyway, until cheap travel and modern mass communications. Those exert a counter-tendency for dialect distinctions to flatten. I briefly lived in Great Britain in the late 1960s; I can certify that a typical modern speaker of British English such as The Mighty Jingles sounds a great deal more “American” than his counterpart would have in 1968-1969. Listening to British comedy makes it clear that even today’s Brits consider the 1969 version of British Received Pronunciation stuffy and old-fashioned. There is no doubt in my mind that TV and movies did that.


The New Southern accent seems to have developed in two phases. First, early, post-Civil-War local differentiation from the border-state-like old-timer dialect they had formerly shared with much of the North, up to about World War One. Then, a convergence phase (well documented by linguists) in which speech became less regionally differentiated across the South as a whole. Again, you can get a chronological read on the second process by listening to movies from different decades set in the American South and comparing those to the live speech of Southerners today.


Americans, if they think about such things at all, tend to assume that rural Southern speech is archaic because the South still has an image as a backward-looking part of the country. But these videos I’ve been watching seem like evidence that New Southern has diverged more from “old-timer” pre-Civil-War American English than Northern speech has. Wikipedia hints at this when it speaks of a “unique vowel shift” in New Southern. They’re implying that New Southern isn’t archaic at all, it’s actually innovative relative to Northern dialects.


I have to think that this reflects some sort of reaction to the disaster of Reconstruction, Southerners grasping at a common linguistic identity that would differentiate them from bluebellies, carpetbaggers and scalawags. (For my readers outside the U.S., it is a fact that residual Southern bitterness about the postwar Reconstruction period of military government tends to eclipse resentments about the Civil War itself.)


What probably happened is that Southerners adopted the most archaic and divergent features of the dialects in their region and generalized them in an innovative way. One of the pioneering studies in sociolinguistics documents a similar process on the island of Nantucket as year-round residents sought to differentiate themselves from a rising tide of tourists and summerbirds. Nantucket permanent residents today sound more like crusty Down-Easter fishermen than they used to.


Now here’s where it gets even more interesting. I’m pretty sure I know how the New Southern dialect features got propagated and uniformized: through country music! The documented emergence of New Southern from the early 1900s seems to track the rising commercialization of country & Western music exactly. Badge of regional identity, check. Plausible widely-disseminated speech models, check. I think we have a winner!


I don’t know if the sociolinguists have figured this out yet. I have not seen any evidence that they have.


All of this turns many of the assumptions most Americans would casually make about our dialect history on their heads. And it means that Twain, had he not died in 1910, would have found New Southern as it evolved increasingly alien from the speech of his childhood in 1840s Missouri. But Twain didn’t live to see the country-music drawl and twang take over the South. Just lucky, I guess.


UPDATE: I found the closest thing that exists to a recording of Twain speaking. It was an imitation of Twain done by a gifted mimic who had been an intimate friend of Twain. It has the tempo and something like the cadence of modern Southern, but the vowel shifts we now associate with Southern are absent or at best only very weakly expressed. I think that supports my other observations.

 •  0 comments  •  flag
Share on Twitter
Published on December 04, 2018 14:48

November 27, 2018

SRC, four years later

Four years ago, I wrote an entire version-control system in a 14-hour burst of inspiration. It’s a small, lightweight tool designed for solo single-file projects that allows several histories to coexist in a single directory – good for /etc files, HOWTOs, or that script collection in your ~/bin directory.


I wasn’t certain, at the time, that the concept would prove out as a production tool for anyone but me. But it did. Here are some statistics: Over 4 years, 21 point releases, 644 commits, 11 committers. Six issues filed by five different users, 20 merge requests. I know of about half a dozen users who’ve raised their hands on IRC or in blog comments. Code has about quintupled in size from the first alpha release (0.1, 513 lines) to 2757 lines today.


That is the statistical profile of a modest success – in fact the developer roster is larger than I realized before I went back through the logs. The main thing looking at the history reveals is that there’s a user community out there that has been sending a steady trickle of minor bug reports and enhancement requests over the whole life of the project. This is a lot more encouraging than dead air would be.


Of course I don’t now how many total users SRC has. But we can base a guess on fanout patterns observed when other projects (usually much larger ones) have done polls to try to measure userbase size. A sound extrapolation would be somewhere between one and two orders of magnitude more than have made themselves visible – so, somewhere between about 200 and 2000.


(There seems to be something like an exponential scaling law at work here. For random open source project X old enough to have passed the sudden-infant-death filter, if there’s an identifiable core dev group in the single-digit range you can generally expect the casual contributors to be about 10x more and the userbase to be at least 100x more.)


SRC has held up pretty well as a design exercise, too. I’ve had complaints about minor bugs in the UI, but nobody bitching about the UI itself. Credit to the Subversion developers I swiped most of the UI design from; their data model may be obsolete, but nobody in VCS-land has done better at UI and I was at least smart enough not to try.


2.7KLOC is nicely compact for an entire version-control system supporting both RCS and SCCS back ends. I don’t expect it to get much larger; there are only two minor items left on the to-do list, neither of which should add significant lines of code.


Today I’m shipping 1.21. With gratitude to everyone that helped improve it.

 •  0 comments  •  flag
Share on Twitter
Published on November 27, 2018 12:13

November 22, 2018

Contemplating the cute brick

Some years ago I predicted that eventually the core of your desktop PC would morph into a physically tiny compute engine that would merge with your smartphone, talking through standard ports and cables to full-sized peripherals like a keyboard and (a too large to be portable) flatscreen.


More recently I examined the way that compute bricks – small-form-factor fanless PCs running low-power chips – have been encroaching on the territory of traditional tower PCs. Players in this space include Jetway, Logic Supply, Partaker, and Shuttle. Poke a search engine with “fanless PC” to get good hits.


I have a Jetway running production in my basement; it’s my Internet-facing mail- and web-server. There’s a second one I have set up with Devuan that I haven’t assigned a role to yet; I may use it as a backup host.


These compute bricks are a station on the way to my original prediction, because they get consumers used to thinking of their utility machines as small compute nodes attached to human-sized peripheral hardware that may have a longer lifetime than the compute node itself.


At the lowest end of the compute-brick class are little engines like the Raspberry Pi. And right above it is something slightly different – bricks with a fan, active cooling enabling them to run the same chips used in tower PCs.


Of course the first machine in this class was the Apple Mac Mini, but it dead-ended years ago for reasons that aren’t Apple’s fault. It was designed before SSDs were really a thing and has spinning-rust-centric design assumptions in its DNA; thus, it’s larger, louder, noisier and waaay more expensive than a Jetway-class brick. Apple must never have sold very many of them; we can tell this by the fact that the product went four years between refreshes.


On the other hand, a couple days ago I dropped in a replacement for my wife’s aging tower PC. It’s an Intel NUC, a brick-with-fan, but unlike the Mac Mini it seems to have been designed from the start around the assumption that its mass storage would be SSD. As such, it achieves what the Mac Mini didn’t quite; it opens a new front in the ephemeralization wars.



Perhaps I shouldn’t say “new”, because Intel has been shipping NUCs for about five years. But I didn’t really understand what Intel was doing until I actually eyeballed one – and discovered that, as well as having a case design that is almost absurdly simple, it’s really pretty. Like, high-end stereo equipment pretty.


The comparison nobody but a geek will notice: my Jetways have something like ten case screws each, and then if you need to change out the SSD you have to deal with six more on a set of detachable rails. The NUC gets away with just four much larger case screws, which double as posts for its rubber feet. Inside, the SSD sits in a fixed drive bay that’s positioned so you never have to move it and doubles as a guide so you literally cannot engage the SATA connector incorrectly or off axis.


What an end-user will notice is the dark-silver anodized finish on the body, the contrast with a top that looks like black glass, and the nice rounding on the corners.


I mean, damn – somebody did a brilliant job of industrial design here, combining easy teardown for servicing with being a pleasure to look at. It’s hard to imagine even Apple, notorious for its attention to the surfaces of tech, doing better.


That makes the comparison to Jetway-class compute bricks rather stark. Despite occasional twitches at a “media center” look, they mostly have DNA from industrial-control machines and all the aesthetic appeal of a mining truck.


If you wonder why I’m focusing so much on appearance, it’s because having spent engineering and manufacturing budget on pretty does not quite square with the NUC’s official positioning. Somebody had to defend that nice finish against the usual headwind of “reduce bill of materials to the bone”.


NUCs ship barebones – processor on board but no DRAM or SSD – because Intel has a bunch of non-compete agreements with the PC manufacturers it sells chips to intended to keep it out of the consumer PC business. The official line is that the NUC is a “development platform” intended to showcase Intel’s newest CPU chips and graphics hardware. A trustworthy source informs me that its other function is to be dogfooded – Intel issues NUCs to its own developers because it wants to avoid the you-don’t-really-know-what’s in there effects of ruthless cost-cutting in the PC supply chain.


Neither use case explains why they made it pretty.


Now, I could be overinterpreting this. It could be somebody just slipped the nice details past the beancounters and it doesn’t actually mean anything strategic. But if I’m right…


…here’s what I think Intel is doing. I think it’s positioning itself for the smartphone-as-portable-compute-node scenario I sketched at the beginning of this blog post. Intel’s planners aren’t stupid; they know there’s a low-power revolution underway in which ARM and Atom are likely to shoulder the “classic” Intel/AMD architecture aside. They may not be ready to think about making smartphones themselves yet, but they want something to compete with when compute-node-centered peripheral clusters start to displace tower PCs.


Why am I not talking about laptops as the doom of the PC? Because, as I’ve pointed out repeatedly before, the ergonomics of laptops actually suck pretty badly. The biggest deal is that you can’t put a bigger display on a laptop than a person can comfortably carry; the craptasticity of laptop keyboards is an issue, too. Sure, you can close the lid and plug in better peripherals, but now what you have is a compute node that is stupidly heavy and expensive for its job.


Fundamentally, carrying around a display with you is an unstable hack that made sense in the past when that hardware was rare and expensive, but not when every hotel room has an HDTV and even airline seatbacks are growing displays. OK, maybe a screen that’s tablet or smartphone-sized makes sense to carry as an occasional fallback, but we’re rapidly moving towards a world where a compute node in your pocket and a USB-C cable to local peripherals will usually address both PC and laptop deployment cases better than hardware specialized for either.


I say “usually” because there are special cases at the high end. All the other towers

in my house have been replaced by compute bricks, but the Great Beast is still the Great Beast. Tower cases retain advantages when really hotrodding your ride requires component modularity. But that’s a 1% case and going to get rarer.


The NUC gives Intel cred and a bit of early-adopter visibility as consumer-facing compute-node makers. Which is going to be handy when the PC and laptop markets crater. There Intel will be, smiling a disarming smile, with a NUC-descended compute node in its hand, saying “Psst…and it’s pretty, too.”


The first signs of the PC market cratering are already past us. Everybody knows sales volume is down from the peak years of that technology. The common mistake is to think that the laptops that have been eating its lunch so far are a stable endpoint rather than, themselves a transitional technology.


The truth is that in late 2018 conventional PCs are like Wile E. Coyote running on air. When I buy one I’m mostly getting metal and bulk. The motherboard is physically designed to host a bunch of expansion slots I’ll never use because the things cards used to do have migrated onto the mobo. The actual working parts don’t actually take up any more volume than a compute brick. They used to, before thumb drives turfed out DVDs and spinning rust got tiny under laptop pressure only to be mostly displaced by SSDs. But today? The main constraint on reducing the size of a computer is that you need surface for all the ports you want.


I think it’s likely that the last redoubt of the PC will be gaming rigs. We’re already at a place where for 99% of consumers the only real reason to buy a tower case is to put a really bitchen graphics card in it, the kind that has a model name like MegaDoomDestroyer and more fan capacity than the rest of your computer and possibly your next-door neighbor’s computer put together. Your typical home-office user would be, like my wife, better served by a cute brick.


But the MegaDoomDestroyer will pass, too. The polygon arms-race will top out when our displays exceed the highest resolution and frame rate the human retina can handle. We’re already pushing the first and the second is probably no more than two Moore’s Law doublings away. After that all the NRE will go into lowering footprint; on past form we can expect ephemeralization to do its job pretty quickly.


The laptop collapse is further out – harder to see from here. Probably a topic for a future post.

1 like ·   •  0 comments  •  flag
Share on Twitter
Published on November 22, 2018 07:05

November 18, 2018

Stop whining and get the job done

I’ve been meaning to do something systematic about losing my overweight for some time. last Thursday I started the process by seeing an endocrinologist who specializes in weight management.


After some discussion, we developed a treatment plan that surprised me not at all. I’m having my TSH levels checked to see if the hypothyroidism I was diagnosed with about a year ago is undertreated. It is quite possible that increasing my levothyroxin dose will correct my basal metabolic rate to something closer to the burn-food-like-a-plasma-torch level it had when I was younger, and I’ll shed pounds that way.


The other part is going on a low-starch, high protein calorie-reduction diet, aiming for intake of less than 1500 calories a day. Been doing that for nine days now. Have lost, according to my bathroom scale, about ten pounds.


I’d have done this sooner if I knew it was so easy. And that’s what I’m here to blog about today.



I’ve spent my entire life listening to jokes, folklore, and sob stories about dieting. If you leave out the obvious marketing hype, the message is always the same: it’s difficult, most people can’t stick to it, if you try you’ll be beset by grouchiness and food cravings, and too often it doesn’t actually work no matter how hard you try.


But here I am. Nine days in, nine pounds off (allowing for measurement uncertainty).


Here’s what I’m doing:


* Counting calories. My wife and I have been auditing my regular meals – and the recurring specials like steak night at the Outback – for calorie count. The top-level aim is to keep my daily calorie intake below 1500.


* Reducing my simple-carb intake. Less bread. Less chocolate. (Alas, my custom of a bedtime cup of hot dark-chocolate cocoa is no more – it’s the only major casualty of the new routine.)


* Reducing portion sizes. One less strip of bacon in the morning. Half a pan-fried boneless pork chop last night instead of the whole thing. On biweekly streak night, a half portion of fries, and grilled asparagus rather than a dressed salad.


* Less snacking. When I’m feeling peckish I have usually dealt with it by munching a small handful of mixed nuts or pistachios. I still do that, but – consciously – less often now.


* Staying a little hungry. I don’t let myself eat to repletion any more; instead I go for 80% full, and skip meals if I think I can get through to the next without serious physical discomfort.


* But: when my blood sugar craters, I eat something – as soon as possible, actually, so the reaction will be less likely to draw me into eating calories I don’t need. I’m not dieting to beat myself up; controlling my intake is a means, not an end.


* Planning ahead. Yesterday some senior people at my kung fu school met for brunch at a local place called “Bacon Me Crazy”. I know what I like to eat there – a bacon sandwich on toasted sourdough bread. Yes, a calorie bomb. I compensated by trimming my early breakfast to just two eggs, skipping the usual bacon and toast; problem solved.


What I’m not doing is attempting a big-bang change to my eating routine, because I don’t think that would be sustainable. Endocrinologist-dude would have me off bread and potatoes entirely; I’ve decided to view that as a tactic rather than the strategy and relax about it as long as I get my overall calorie reduction.


Now here is the part that is kind of pissing me off. This is not difficult.


I asked Cathy to give me a heads up if she thought I was getting grouchy or withdrawn due to undereating; she consistently reports that this has not occurred. I don’t have food cravings or constant distraction by thoughts of eating. While I don’t exactly like feeling slightly hungry most of the time, I’ve become used to it; it’s no big deal.


I’m pretty sure I can keep this up long enough to get back into size 42 pants. (Stretch goal is 40.) So, why have I been hearing all my life that dieting is a gauntlet of hell?


Admittedly it helps that I eat a relatively high-protein, low-starch diet by choice even when I’m not dieting, because I like it. Still, I have to ask… What is wrong with you people out there? Have most of you not got the willpower of overcooked spaghetti? Are a majority of you too stupid to do calorie counts and intelligent adaptation?


The only thing my diet is making me grouchy about is other dieters. Stop whining and get the job done!

 •  0 comments  •  flag
Share on Twitter
Published on November 18, 2018 06:58

October 27, 2018

On the Squirrel Hill shooting

To my Jewish friends and followers:


I’m grieving with you today. I know the neighborhood where Tree of Life synagogue sits – it’s a quiet, well-off, slightly Bohemian ‘burb with a lot of techies living in it.


I’m not Jewish myself, but I figured out a long time ago that any society which abuses its Jews – or tolerates abuse of them – is in the process of flushing itself down the crapper. The Jews are almost always the first targets of the enemies of civilization, but never the last.


But I’m not posting to reply only with words.


Any Jew who can get close enough to me in realspace for it to be practical and asks can have from me free instruction in basic self-defense with firearms and anti-active-shooter tactics. May no incident like this ever occur again – but if it does, I would be very proud if one of my students took down the evildoer before it reached bloodbath stage.

 •  0 comments  •  flag
Share on Twitter
Published on October 27, 2018 15:56

October 22, 2018

How to write narrative documentation

The following is a very lightly edited version of email I wrote to my apprentice Ian Bruene after he wrote documentation for his new Kommandant project that was, alas, as awful as I generally expect from programmers. I’m not training Ian for mere coding competence; he’s too talented for that and anyway I have higher standards. This is my way of insisting that he do documentation well – and it was he who suggested it would make a good blog post.




Here’s how to write narrative documentation for a thing like Kommandant.


You want to do this well because people will pre-judge the quality of your software by how clearly you can write about it. They’re right to do so; explaining it clearly is their best warrant that your thinking was not muddy and confused when you wrote it.


In fact, writing good documentation is an excellent way to ensure that you really understand the problem space you’re in, and to throw light into corners of your software where defects might lurk. Do not underestimate the power of this effect! Often enough to matter, it will save you from serious embarrassment.


Doing this right needn’t be difficult. The quality of your documentation is like the quality of your code – less a result of how much effort you put in than it is of having the right mental habits to begin with.


One important mental habit is to not be terrified by the blank sheet of paper. One of my goals in giving you a procedure to follow is so when you start something like this you can do it on automatic pilot. That will make it easier to deal when you have to write the more difficult, more project-specific parts. Over time you will evolve the procedure to suit your tastes, and that’s fine. It’s meant to a springboard, not a straitjacket.


First choose a title. Make it simple and direct. Use the name of the product in it. The phrase “how to” may occur. Examples: “Writing Applications with Kommandant”, “Kommandant Client how-to”.


Then write an introductory paragraph that explains why the product exists. A good section title might be “Introduction” or “Motivation”. It is often good for the first, topic sentence to read something like “Foo is a library for doing Bar”. If the product is derived from or modeled on another piece of software, nod in that direction. If there are examples of similar products your userbase is likely to know about, mention them.


The reader should exit the introduction with a clear sense of how the software will help him and why he cares.


Next, a section on theory of operation. This explains the problem your software solves in more detail, and describes the strategy it uses. Here is where you want to sketch the relationship between the Kommandant core and the class that implements the user’s commands. Don’t get too far into the API weeds here, but *do* establish terminology – like “core”, “helper” and the different categories of user-defined hooks.


Be clear about why things are done as they are. Sometime this is best done by nodding at alternatives. Like: “We could have required the user to explicitly register command handlers, but by introspecting on the client class we both simplify the problem and eliminate a potential source of defects. Python Cmd set a good example here.”


The user should exit the “Theory of operation” section grasping the concepts he needs to understand a detailed description of the UI/API.


The next two sections should be a detailed API description and a set of simple worked examples with explanation. Which order to do this in depends on your product. For something as simple as Kommandant, examples first probably works better. On the other hand, if the examples are necessarily so complex that they’re hard to read without having seen the API description first, they go second.


(If you find yourself in the latter situation, consider it as a clue that your design may be overcomplicated and need a rethink.)


The reason it’s good to put worked examples first if you can is that expository arc in a document like this should always try to supply *motivation* at each step before requiring the reader to do more mental work. The examples are motivation for grappling with the API description.


(Notice that my explanation has followed the same kind of expository arc I’m describing…)


Because you’ll have an API reference nearby there is little need to annotate the examples heavily. Focus on details the reader might find surprising or a little tricky, like the difference between a basic-mode application and one that calls readline.


Your API section should walk through each call, ideally in the order they would occur in a typical application. You’re trying to do two things here: fill in details and tell a story about how an application uses your library


Next section: tricks and traps. Anticipate what might confuse your user about the API and address it directly. A good example in Kommandant is the fact that if you want to customize your prompt, it’s not enough to do it in the PreCmd hook; you have to do in PreLoop too, or the very first prompt won’t be what you expect. Explain why each gotcha is the way it is.


Next section: Credits and connections. Who contributed to this? Who sponsored it? What was its proof-of-concept application?


Finally, a brief revision history. Tie the document to versions of your software. Summarize changes. A user who is revisiting this document should be able to look here and tell what parts to read for the updates.


Once you have written a document like this, a one-stop shop for what users need to know, it attracts updates from them the way your software attracts patches. *You* will find it easier to keep in sync with your software than if its information were scattered across multiple files.


Now read and think about this document


http://www.catb.org/gpsd/client-howto...


as an example of the form. You don’t need to write anything so long or elaborate, because Kommandant’s problem domain is simpler.


When you get this right, the result will feel a little like art; this is not an accident. I’ve been using words like “story” and “narrative arc” for a reason. By engaging the human brain in the way it wants to be engaged, you lower the friction cost for your users of acquiring the information they need. This is a functional virtue and they will love you for it, giving your software long legs.


You have an advantage here. You have a sharp sense of humor that pervades what you write and say. Let that work for you.


Now go do it right; make me proud.



Some time later Ian popped up on IRC saying


* | ianbruene has enough teeth into HOWTO to start enjoying the process

To which I replied:


esr | ianbruene: *Excellent.* That was part of my evil plan.

Which of course it was. You’ll never write decent documentation if you think of it as a deadly chore. You need to learn how to take pride in your documentation and enjoy the process so your users will, too.

 •  0 comments  •  flag
Share on Twitter
Published on October 22, 2018 11:32

October 8, 2018

Reposurgeon’s Excellent Journey and the Waning of Python

Time to make it public and official. The entire reposurgeon suite (not just repocutter and repomapper, which have already been ported) is changing implementation languages from Python to Go. Reposurgeon itself is about 37% translated, with pretty good unit-test coverage. Three of my collaborators on the project (Daniel Brooks, Eric Sunshine, and Edward Cree) have stepped up to help with code and reviews.


I’m posting about this because the pressures driving this move are by no means unique to the reposurgeon suite. Python, my favorite working language for twenty years, can no longer cut it at the scale I now need to operate – it can’t handle large enough working sets, and it’s crippled in a world of multi-CPU computers. I’m certain I’m not alone in seeing these problems; if I were, Google, which used to invest heavily in Python (they had Guido on staff there for a while) wouldn’t have funded Go.


Some of Python’s issues can be fixed. Some may be unfixable. I love Guido and the gang and I am vastly grateful for all the use and pleasure I have gotten out of Python, but, guys, this is a wake-up call. I don’t think you have a lot of time to get it together before Python gets left behind.


I’ll first describe the specific context of this port, then I’ll delve into the larger issues about Python, how it seems to be falling behind, and what can be done to remedy the situation.



The proximate cause of the move is that reposurgeon hit a performance wall on the GCC Subversion repository. 259K commits, bigger than anything else reposurgeon has seen by almost an order of magnitude; Emacs, the runner-up, was somewhere a bit north of 33K commits when I converted it.


The sheer size of the GCC repository brings the Python reposurgeon implementation to its knees. Test conversions take more than nine hours each, which is insupportable when you’re trying to troubleshoot possible bugs in what reposurgeon is doing with the metadata. I say “possible” because we’re in a zone where defining correct behavior is rather murky; it can be difficult to distinguish the effects of defects in reposurgeon from those of malformations in the metadata, especially around the scar tissue from CVS-to-SVN conversion and near particularly perverse sequences of branch copy operations.


I was seeing OOM crashes, too – on a machine with 64GB of RAM. Alex, I’ll take “How do you know you have a serious memory-pressure problem?” for $400, please. I was able to head these off by not running a browser during my tests, but that still told me the working set is so large that cache misses are a serious performance problem even on a PC design specifically optimized for low memory-access latency.


I had tried everything else. The semi-custom architecture of the Great Beast, designed for this job load, wasn’t enough. Nor were accelerated Python implementations like cython (passable) or pypy (pretty good). Julien Rivaud and I did a rather thorough job, back around 2013, of hunting down and squashing O(n^^2) operations; that wasn’t good enough either. Evidence was mounting that Python is just too slow and fat for work on really large datasets made of actual objects.


That “actual objects” qualifier is important because there’s a substantial scientific-Python community working with very large numeric data sets. They can do this because their Python code is mostly a soft layer over C extensions that crunch streams of numbers at machine speed. When, on the other hand, you do reposurgeon-like things (lots of graph theory and text-bashing) you eventually come nose to nose with the fact that every object in Python has a pretty high fixed minimum overhead.


Try running this program:



from __future__ import print_function

import sys
print(sys.version)
d = {
"int": 0,
"float": 0.0,
"dict": dict(),
"set": set(),
"tuple": tuple(),
"list": list(),
"str": "",
"unicode": u"",
"object": object(),
}
for k, v in sorted(d.items()):
print(k, sys.getsizeof(v))

Here’s what I get when I run it under the latest greatest Python 3 on my system:



3.6.6 (default, Sep 12 2018, 18:26:19)
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]]
dict 240
float 24
int 24
list 64
object 16
set 224
str 49
tuple 48
unicode 49

There’s a price to be paid for all that dynamicity and duck-typing that the scientific-Python people have evaded by burying their hot loops in C extensions, and the 49-byte per-string overhead is just the beginning of it. The object() size in that table is actually misleadingly low; object instance is a dictionary with its own hash table, not a nice tight C-like struct with fields at fixed offsets. Field lookup costs some serious time.


Those sizes may not look like a big deal, and they aren’t – not in glue scripts. But if you’re instantiating 359K objects containing actual data the overhead starts to pile up fast.


Alas, I can’t emulate the scientific-Python strategy. If you try to push complex graph-theory computations into C your life will become a defect-riddled hell, for reason’s I’ve previously described as greenspunity. This is not something you want to do, ever, in a language without automatic memory management.


Trying to break the GCC conversion problem into manageable smaller pieces won’t work either. This is a suggestion I’m used to hearing from smart people when I explain the problem. To understand why this won’t work, think of a Subversion repository as an annotated graph in which the nodes are (mainly) things like commit representations and the main link type is “is a parent of”. A git repository is a graph like that too, but with different annotations tied to a different model of revisioning.


The job of reposurgeon is to mutate a Subversion-style graph into a git-style graph in a way that preserves parent relationships, node metadata, and some other relations I won’t go into just now. The reason you can’t partition the problem is that the ancestor relationships in these graphs have terrible locality. Revisions can have parents arbitrarily far back in the history, arbitrarily close to the zero point. There aren’t any natural cut points where you can partition the problem. This is why the Great Beast has to deal with huge datasets in memory all at once.


My problem points at a larger Python issue: while there probably isn’t much work on large datasets using data structures quite as complex and poorly localized as reposurgeon’s, it’s probably less of an outlier in the direction of high overhead than scientific computation is in the direction of low. Or, to put it in a time-focused way, as data volumes scale up the kinds of headaches we’ll have will probably look more like reposurgeon’s than like a huge matrix-inversion or simulated-annealing problem. Python is poorly equipped to compete at this scale.


That’s a general problem in Python’s future. There are others, which I’ll get to. Before that, I want to note that settling on a new implementation language was not a quick or easy process. After the last siege of serious algorithmic tuning in 2013 I experimented with Common LISP, but that effort ran aground because it was missing enough crucial features to make the gap from Python look impractical to bridge. A few years later I looked even more briefly at Ocaml; same problem. actually even worse.


I didn’t make a really serious effort to move sooner than 2018 because, until the GCC repository, I was always able to come up with some new tweak of reposurgeon or the toolchain underneath it that would make it just fast enough to cope with the current problem. But the problems kept getting larger and nastier (I’ve noted the adverse selection problem here). The GCC repo was the breaking point.


While this was going on, pre-GCC, I was also growing somewhat discontented with Python for other reasons. The most notable one at the time was the Python team’s failure to solve the notorious GIL (Global Interpreter Lock) problem. The GIL problem effectively blocks any use of concurrency on programs that aren’t interrupted by I/O waits. What it meant, functionally, was that I couldn’t use multithreading in Python to speed up operations like comment-text searches; those never hit the disk or network. Annoying…here I am with a 16-core hot-rod and reposurgeon can only use one (1) of those processors.


It turns out the GIL problem isn’t limited to non-I/O-bound workloads like mine, either, and it’s worse than most Python developers know. There’s a rather terrifying talk by David Beazley showing that the GIL introduces a huge amount of contention overhead when you try to thread across multiple processors – so much so that you can actually speed up your multi-threaded programs by disabling all but one of your processors!


This of course isn’t just a reposurgeon problem. Who’s going to deploy Python for anything serious if it means that 15/16ths of your machine becomes nothing more than a space heater? And yet the Python devs have shown no sign of making a commitment to fix this. They seem to put a higher priority on not breaking their C extension API. This…is not a forward-looking choice.


Another issue is the Python 2 to 3 transition. Having done my bit to make it as smooth as possible by co-authoring Practical Python porting for systems programmers with reposurgeon collaborator Peter Donis, I think I have the standing to say that the language transition was fairly badly botched. A major symptom of the botchery is that the Python devs unnecessarily broke syntactic compatibility with 2.x in 3.0 and didn’t restore it until 3.2. That gap should never have opened at all, and the elaborateness of the kluges Peter and I had to develop to write polyglot Python even after 3.2 are an indictment as well.


It is even open to question whether Python 3 is a better language than Python 2. I could certainly point out a significant number of functional improvements, but they are all overshadowed by the – in my opinion – extremely ill-advised decision to turn strings into Unicode code-point sequences rather than byte sequences.


I felt like this was a bad idea when 3.0 shipped; my spider-sense said “wrong, wrong, wrong” at the time. It then caused no end of complications and backward-incompatibilities which Peter Donis and I later had to paper over. But lacking any demonstration of how to do better I didn’t criticize in public.


Now I know what “Do better” looks like. Strings are still bytes. A few well-defined parts of your toolchain construe them as UTF-8 – notably, the compiler and your local equivalent of printf(3). In your programs, you choose whether you want to treat string payloads as uninterpreted bytes (implicitly ASCII in the low half) or as Unicode code points encoded in UTF-8 by using either the “strings” or “unicode” libraries. If you want any other character encoding, you use codecs that run to and from UTF-8.


This is how Go does it. It works, it’s dead simple, it confines encoding dependencies to the narrowest possible bounds – and by doing so it demonstrates that Python 3 code-point sequences were a really, really bad idea.


The final entry in our trio of tribulations is the dumpster fire that is Python library paths. This has actually been a continuing problem since GPSD and has bitten NTPSec pretty hard – it’s a running sore on our issue tracker, so bad that were’re seriously considering moving our entire suite of Python client tools to Go just to get shut of it.


The problem is that where on your system you need to put a Python library module in order so that a Python main program (or other library) can see it and load it varies in only semi-predictable ways. By version, yes, but there’s also an obscure distinction between site-packages, dist-packages, and what for want of any better term I’ll call root-level modules (no subdirectory under the version directory) that different distributions and even different application packages seem to interpret in different and incompatible ways. The root of the problem seems to be that good practice is under-specified by the Python dev team.


This is particular hell on project packagers. You don’t know what version of Python your users will be running, and you don’t know what the contents of their sys.path (library load path variable). You can’t know where your install production should put things so the Python pieces of your code will be able to see each other. About all you can do is shotgun multiple copies of your library to different plausible locations and hope one of them intersects with your user’s load path. And I shall draw a kindly veil over the even greater complications if you’re shipping C extension modules…


Paralysis around the GIL, the Python 3 strings botch, the library-path dumpster fire – these are signs of a language that is aging, grubby, and overgrown. It pains me to say this, because I was a happy Python fan and advocate for a long time. But the process of learning Go has shed a harsh light on these deficiencies.


I’ve already noted that Go’s Unicode handling implicitly throws a lot of shade. So does its brute-force practice of building a single self-contained binary from source every time. Library paths? What are those?


But the real reason that reposurgeon is moving to Go – rather than some other language I might reasonably think I could extract high performance from – is not either of these demonstrations. Go did not get this design win by being right about Unicode or build protocols.


Go got this win because (a) comparative benchmarks on non-I/O-limited code predict a speedup of around 40x, which is good enough and competitive with Rust or C++, and (b) the semantic gap between Python and Go seemed surprisingly narrow, reducing the expected translation time lower than I could reasonably expect from any other language on my radar.


Yes, static typing vs. Python’s dynamic seems like it ought to be a big deal. But there are several features that converge these languages enough to almost swamp that difference. One is garbage collection; the second is the presences of maps/dictionaries; and the third is strong similarities in low-level syntax.


In fact, the similarities are so strong that I was able to write a mechanical Python-to-Go translator’s assistant – pytogo – that produces what its second user described as a “a good first draft” of a Go translation. I described this work in more detail in Rule-swarm attacks can outdo deep reasoning.


I wrote pytogo around roughly the 22% mark (just short of 4800) lines out of 14000 in the translation and am now up to 37% out of 16000. The length of the Go plus commented-out untranslated Python has been creeping up because Go is less dense – all those explicit close brackets add up. I am now reasonably confident of success, though there is lots of translation left to do and one remaining serious technical challenge that I may discuss in a future post.


For now, though, I want to return to the question of what Python can do to right its ship. For this project the Python devs have certainly lost me; I can’t afford to wait on them getting their act together before finishing the GCC conversion. The question is what they can do to stanch more defections to Go, a particular threat because the translation gap is so narrow.


Python is never going to beat Go on performance. The fumbling of the 2/3 transition is water under the dam at this point, and I don’t think it’s realistically possible to reverse the Python 3 strings mistake.


But that GIL problem? That’s got to get solved. Soon. In a world where a single-core machine is a vanishing oddity outside of low-power firmware deployments, the GIL is a millstone around Python’s neck. Otherwise I fear the Python language will slide into shabby-genteel retirement the way Perl has, largely relegated to its original role of writing smallish glue scripts.


Smothering that dumpster fire would be a good thing, too. A tighter, more normative specification about library paths and which things go where might do a lot.


Of course there’s also a positioning issue. Having lost the performance-chasers to Go, Python needs to decide what constituency it wants to serve and can hold onto. That problem I can’t solve, just point out what technical problems are both seriously embarrassing and fixable. That’s what I’ve tried to do.


As I said at the beginning of this rant, I don’t think there’s a big window of time in which to act, either. I judge the Python devs do not have a year left to do something convincing about the GIL before Go completely eats their lunch, and I’m not sure they have even six months. They’d best get cracking.

1 like ·   •  0 comments  •  flag
Share on Twitter
Published on October 08, 2018 13:38

October 2, 2018

Rule-swarm attacks can outdo deep reasoning

It not news to readers of this blog that I like to find common tactics and traps in programming that don’t have names and name them. I don’t only do this because it’s fun. When you have named a thing you give your brain permission to reason about it as a conceptual unit. Bad jargon obfuscates, map hiding territory; good jargon reveals, aiding reflection on and and improvement of your practice.


In my last post I coined “shtoopid problem”. It went viral; every programmer has hit this, and it’s useful to have the term because you can attach to it recognition rules and tactics for escaping such traps. (And not only in programming; consider kafkatrapping).


Today’s invention is the term “rule-swarm attack”. It’s derived from the military term “swarm attack” and opposed to “deep reasoning”, “structural analysis” and “generative rules”. I’ll explain it and provide some case studies.



A rule-swarm attack is what you can sometimes do when you have some messy data-reduction or data-translation problem and deep reasoning can’t be applied effectively – either you don’t have a theory or the theory is too expensive to apply in the place you are working. So instead you look for patterns – cliches – in the data and apply a whole bunch of little, individually stupid rules that transform it towards what you want. You win when the result is observably good enough.


It’s curious that this strategy never had a general name before, because it’s actually pretty common. Peephole optimizers in compilers. Statistical language translation as Google does it. In AI it’s called “production systems” – they’re widely used for tasks like automated medical diagnoses. It’s not principle-based – rule-swarms know nothing about meaning in any introspective sense, they’re just collections of if-this-then-do-thats applied recursively until you have reached a state where no rules can fire.


Yes, we are in the territory of Searle’s “Chinese Room” thought experiment. I’m just going to nod in the direction of the philosophical issues, because a dive into the meaning of meaning isn’t what I want to do in this post. Today I’m here to give practical engineering advice.


I’ve written before about my program doclifter, which lifts groff/troff markup to structural XML using a rule-swarm strategy. The thing to notice is that doclifter has to work that way because its inputs are an only weakly structured tag soup. Deep-reasoning approaches cough and die on datasets like this, they can’t deal with irregularity gracefully enough to cope.


This is fundamentally the same reason natural-language translation by statistical coupling of text or speech utterances has beaten the living crap out of approaches that try to extract some kind of Chomskian deep structure from language A and then render it in language B as though you’re a compiler or transpiler back end. Natural language, it turns out, just doesn’t work that way – an insight which, alas, hasn’t yet exploded as many heads among theoretical linguists as it should have.


But: rule swarms can be a useful strategy even when your data is in some sense perfectly regular and well-formed. Transpiling computer languages is a good example. They’re not messy in the way natural languages are. The conventional, “principled” way to transpile is to analyze the code in the source language into an AST (abstract syntax tree) then generate code in the target language from the AST.


This is elegant in theory, and if it works at all you probably get a formally perfect look-ma-no-handwork translation. But in practice the deep-reasoning strategy has two serious problems:


1. Preserving comments is hard. Most code-walkers (the source-code analyzers at the front end of transpilers) don’t even try. This isn’t because the writers are lazy; good rules for where to attach comments to the right node in in an AST are remarkably hard to formulate. Even when you pull that off, knowing where to interpolate them in the target language output is much more difficult. You’d need to have some detailed theory of how each segment of the source AST corresponds to some unique segment of the target AST. That’s really difficultt when the language grammars are more than trivially different. So most codewalkers don’t even try.


2. There is a C-to-Go source transpiler out there which shall remain nameless because, although it looks like it may do an excellent job in all other respects, it produces generated Go that is utterly horrible. Beyond obfuscated; unreadable, unmaintainable….


…and thus, unacceptable. No responsible infrastructure maintainer would or should tolerate this sort of thing. But it is, alas, a pretty common problem with the output of transpilers. Losing comments is not really acceptable either; often, especially in older code, they are repositories of hard-won knowledge that future maintainers will need. What then, is one to do?


Language translation by rule-swarm attack, using purely textual transformations on the source file, can be an answer. It’s easy to preserve comments and program structure if you can pull this off at all. It only works if (a) the syntactic and semantic distance between source and target languages is relatively small, (b) you’re willing to hand-fix the places where textual rewriting rules can’t do a good enough job, and (c) you don’t care that theorists will scoff “that kludge can’t possibly work” at you.


Then again, sometimes there’s no competition for a rule-swarm attack to beat because everbody thinks principled AST-based translation would be too hard, rule-swarming would be too flaky, and nobody (except, er, me) actually tries either thing.


Case in point: Back in 2006 I wrote ctopy, a crude C-to-Python translator, because it seemed possible and nobody was offering me a better tool. It doesn’t parse C, it just bashes C code with regular-expression transformations until it gets to something that, as Douglas Adams might have put it, is “almost, but not completely unlike” idiomatic Python. As far as I’m aware there still isn’t a tool that does more complete translation. And while ctopy is in no way perfect, it is a good deal better than nothing.


Swarm-attack translators like doclifter and ctopy are best viewed as translator’s assistants; their role is to automate way the parts computers do well but humans do poorly so humans can concentrate on the parts they do well and computers do poorly.


In Automatons, judgment amplifiers, and DSLs I made the case (about reposurgeon) that designing tools for judgment-amplifier workflow is sometimes a much better choice that trying for fully automatic conversion tools. Too often, when going for full automation, we sacrifice output quality. Like transpiling horrifyingly unreadable Go from clean C. Or getting crappy translations of repositories when reposurgeon assisting a human could have made good ones.


So, two days ago I shipped version 1.0 of pytogo, a crude Python-to-Go translator’s assistant. I wrote it because the NTPsec project has a plan to move its Python userspace tools to Go in order to get shut of some nasty issues in Python deployment (those of you who have been there will know what I mean when I say “library-directory hell”).


pytogo works way better than ctopy did; the semantic gap between Python and Go is much narrower than the gap between C and Python, because GC and maps as a first-class data type make that much difference. I’m field-qualifying it by using it to translate reposurgeon to Go, and there is already a report on the go-nuts list of someone other than me using it successfully.


You can read about the transformation it does here. More will be added in the future – in fact I notice that list is already slightly out of date and will fix it.


Besides preserving comments and structure, the big advantage of the rule-swarm approach pytogo uses is that you don’t have to have a global vision going in. You can discover targets of opportunity as you go. The corresponding disadvantage is that your discovery process can easily turn into a Zeno tarpit, spending ever-increasing effort on ever-decreasing returns.


Of course rule-swarm attacks can also fail by insufficiency. You might find out that the deep-structure fans are right about a particular problem domain, that rule swarms are bound to be either ineffective or excessively prone to unconstrainable false positives. It’s hard to predict this in advance; about all you can do is exploit the low cost of starting a rule-swarm experiment, notice when it’s failing, and stop.


Oh, and you will find that a lot of your effort goes into avoiding false matches. Having a regression-test suite, and running it frequently so you get fast notification when an ambitious new rule craps on your carpet, is really important. Start building it from day one, because you will come to regret it if you don’t.


And now Eric reaches for the sweeping theoretical summation….maybe? I think the real lesson here is about methodological prejudices. Western culture has a tendency one might call “a-priorism” that goes clear back to the ancient Greeks, an over-fondess for theory-generated methods as opposed to just plunging your head and hands into the data and seeing where that takes you. It’s a telling point that the phrase “reasoning from first principles” has a moralistic ring to it, vaguely tying starting from fixed premises to personal honor and integrity.


Because of this, the most difficult thing about rule-swarm attacks may be allowing oneself to notice that they’re effective. Natural-language translation was stunted by this for decades. Let’s learn how not to repeat that mistake, eh?

1 like ·   •  0 comments  •  flag
Share on Twitter
Published on October 02, 2018 06:15

Eric S. Raymond's Blog

Eric S. Raymond
Eric S. Raymond isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.
Follow Eric S. Raymond's blog with rss.