Rule-swarm attacks can outdo deep reasoning
It not news to readers of this blog that I like to find common tactics and traps in programming that don’t have names and name them. I don’t only do this because it’s fun. When you have named a thing you give your brain permission to reason about it as a conceptual unit. Bad jargon obfuscates, map hiding territory; good jargon reveals, aiding reflection on and and improvement of your practice.
In my last post I coined “shtoopid problem”. It went viral; every programmer has hit this, and it’s useful to have the term because you can attach to it recognition rules and tactics for escaping such traps. (And not only in programming; consider kafkatrapping).
Today’s invention is the term “rule-swarm attack”. It’s derived from the military term “swarm attack” and opposed to “deep reasoning”, “structural analysis” and “generative rules”. I’ll explain it and provide some case studies.
A rule-swarm attack is what you can sometimes do when you have some messy data-reduction or data-translation problem and deep reasoning can’t be applied effectively – either you don’t have a theory or the theory is too expensive to apply in the place you are working. So instead you look for patterns – cliches – in the data and apply a whole bunch of little, individually stupid rules that transform it towards what you want. You win when the result is observably good enough.
It’s curious that this strategy never had a general name before, because it’s actually pretty common. Peephole optimizers in compilers. Statistical language translation as Google does it. In AI it’s called “production systems” – they’re widely used for tasks like automated medical diagnoses. It’s not principle-based – rule-swarms know nothing about meaning in any introspective sense, they’re just collections of if-this-then-do-thats applied recursively until you have reached a state where no rules can fire.
Yes, we are in the territory of Searle’s “Chinese Room” thought experiment. I’m just going to nod in the direction of the philosophical issues, because a dive into the meaning of meaning isn’t what I want to do in this post. Today I’m here to give practical engineering advice.
I’ve written before about my program doclifter, which lifts groff/troff markup to structural XML using a rule-swarm strategy. The thing to notice is that doclifter has to work that way because its inputs are an only weakly structured tag soup. Deep-reasoning approaches cough and die on datasets like this, they can’t deal with irregularity gracefully enough to cope.
This is fundamentally the same reason natural-language translation by statistical coupling of text or speech utterances has beaten the living crap out of approaches that try to extract some kind of Chomskian deep structure from language A and then render it in language B as though you’re a compiler or transpiler back end. Natural language, it turns out, just doesn’t work that way – an insight which, alas, hasn’t yet exploded as many heads among theoretical linguists as it should have.
But: rule swarms can be a useful strategy even when your data is in some sense perfectly regular and well-formed. Transpiling computer languages is a good example. They’re not messy in the way natural languages are. The conventional, “principled” way to transpile is to analyze the code in the source language into an AST (abstract syntax tree) then generate code in the target language from the AST.
This is elegant in theory, and if it works at all you probably get a formally perfect look-ma-no-handwork translation. But in practice the deep-reasoning strategy has two serious problems:
1. Preserving comments is hard. Most code-walkers (the source-code analyzers at the front end of transpilers) don’t even try. This isn’t because the writers are lazy; good rules for where to attach comments to the right node in in an AST are remarkably hard to formulate. Even when you pull that off, knowing where to interpolate them in the target language output is much more difficult. You’d need to have some detailed theory of how each segment of the source AST corresponds to some unique segment of the target AST. That’s really difficultt when the language grammars are more than trivially different. So most codewalkers don’t even try.
2. There is a C-to-Go source transpiler out there which shall remain nameless because, although it looks like it may do an excellent job in all other respects, it produces generated Go that is utterly horrible. Beyond obfuscated; unreadable, unmaintainable….
…and thus, unacceptable. No responsible infrastructure maintainer would or should tolerate this sort of thing. But it is, alas, a pretty common problem with the output of transpilers. Losing comments is not really acceptable either; often, especially in older code, they are repositories of hard-won knowledge that future maintainers will need. What then, is one to do?
Language translation by rule-swarm attack, using purely textual transformations on the source file, can be an answer. It’s easy to preserve comments and program structure if you can pull this off at all. It only works if (a) the syntactic and semantic distance between source and target languages is relatively small, (b) you’re willing to hand-fix the places where textual rewriting rules can’t do a good enough job, and (c) you don’t care that theorists will scoff “that kludge can’t possibly work” at you.
Then again, sometimes there’s no competition for a rule-swarm attack to beat because everbody thinks principled AST-based translation would be too hard, rule-swarming would be too flaky, and nobody (except, er, me) actually tries either thing.
Case in point: Back in 2006 I wrote ctopy, a crude C-to-Python translator, because it seemed possible and nobody was offering me a better tool. It doesn’t parse C, it just bashes C code with regular-expression transformations until it gets to something that, as Douglas Adams might have put it, is “almost, but not completely unlike” idiomatic Python. As far as I’m aware there still isn’t a tool that does more complete translation. And while ctopy is in no way perfect, it is a good deal better than nothing.
Swarm-attack translators like doclifter and ctopy are best viewed as translator’s assistants; their role is to automate way the parts computers do well but humans do poorly so humans can concentrate on the parts they do well and computers do poorly.
In Automatons, judgment amplifiers, and DSLs I made the case (about reposurgeon) that designing tools for judgment-amplifier workflow is sometimes a much better choice that trying for fully automatic conversion tools. Too often, when going for full automation, we sacrifice output quality. Like transpiling horrifyingly unreadable Go from clean C. Or getting crappy translations of repositories when reposurgeon assisting a human could have made good ones.
So, two days ago I shipped version 1.0 of pytogo, a crude Python-to-Go translator’s assistant. I wrote it because the NTPsec project has a plan to move its Python userspace tools to Go in order to get shut of some nasty issues in Python deployment (those of you who have been there will know what I mean when I say “library-directory hell”).
pytogo works way better than ctopy did; the semantic gap between Python and Go is much narrower than the gap between C and Python, because GC and maps as a first-class data type make that much difference. I’m field-qualifying it by using it to translate reposurgeon to Go, and there is already a report on the go-nuts list of someone other than me using it successfully.
You can read about the transformation it does here. More will be added in the future – in fact I notice that list is already slightly out of date and will fix it.
Besides preserving comments and structure, the big advantage of the rule-swarm approach pytogo uses is that you don’t have to have a global vision going in. You can discover targets of opportunity as you go. The corresponding disadvantage is that your discovery process can easily turn into a Zeno tarpit, spending ever-increasing effort on ever-decreasing returns.
Of course rule-swarm attacks can also fail by insufficiency. You might find out that the deep-structure fans are right about a particular problem domain, that rule swarms are bound to be either ineffective or excessively prone to unconstrainable false positives. It’s hard to predict this in advance; about all you can do is exploit the low cost of starting a rule-swarm experiment, notice when it’s failing, and stop.
Oh, and you will find that a lot of your effort goes into avoiding false matches. Having a regression-test suite, and running it frequently so you get fast notification when an ambitious new rule craps on your carpet, is really important. Start building it from day one, because you will come to regret it if you don’t.
And now Eric reaches for the sweeping theoretical summation….maybe? I think the real lesson here is about methodological prejudices. Western culture has a tendency one might call “a-priorism” that goes clear back to the ancient Greeks, an over-fondess for theory-generated methods as opposed to just plunging your head and hands into the data and seeing where that takes you. It’s a telling point that the phrase “reasoning from first principles” has a moralistic ring to it, vaguely tying starting from fixed premises to personal honor and integrity.
Because of this, the most difficult thing about rule-swarm attacks may be allowing oneself to notice that they’re effective. Natural-language translation was stunted by this for decades. Let’s learn how not to repeat that mistake, eh?
Eric S. Raymond's Blog
- Eric S. Raymond's profile
- 140 followers
