Automatons, judgment amplifiers, and DSLs

Brute force beats premature optimization Practical Python Porting for systems ...

Automatons, judgment amplifiers, and DSLs

Do we make too many of our software tools automatons when they should be judgment amplifiers? And why don’t we write more DSLs?

Back in the Renaissance there was a literary tradition of explaining natural philosophy via conversations among imaginary characters. I’m going to revive that this evening because I had an IRC conversation this afternoon, about the design insights behind reposurgeon, that pretty much begs to be presented this way.

The person of “Simplicio” was Galileo’s invention in his Dialogue Concerning the Two Chief World Systems. Here he represents four different people, but almost everything he says is something one of them in fact said or very plausibly might have. I’ve cleaned it up, edited, and amplified only a little.

For those of you coming in late, reposurgeon is a tool I wrote for editing version-control histories. It has many applications, including highest-quality repository conversions. Simplicio needed to excise some security-sensitive credentials from a DHS code repository – not just from the tip version but from the entire history. Reposurgeon is pretty much the only practical way to do this.

So, without further ado…

SIMPLICIO pounce-hugs ESR.

SIMPLICIO: I have to run back to work, but I just wanted to say…reposurgeon is FREAKING AWESOME.

ESR: I take it you figured out how to do the necessary.

SIMPLICIO: Yep. How did you *imagine* that?

ESR: I love designing DSLs (domain-specific languages). What you are seeing as “awesome” is the result of proper attention to keeping the language primitives in the DSL mutually orthogonal, so they can have fruitful combinations I did not anticipate. This is a design style that is difficult to do well, but when you pull it off the payoff is huge.

SIMPLICIO nods.

ESR: You might be entertained to know what the model for reposurgeon’s DSL was. Brace for it: … ed(1).

SIMPLICIO: LOL

ESR: I’m not joking.

ESR: Think about how reposurgeon selections interact with the command verbs. Pick a collection of records, do something to it – possibly with auxiliary following arguments.

SIMPLICIO: I know you’re not joking, it’s still amusing. Heh. The original Patriarchs of Unix were truly worthy of their mantles.

ESR: There were two big insights behind the design of reposurgeon:

(1) Attempts to fully automate repo-conversion tools are doomed by ontological mismatches between VCSes. Bridging those requires case-by-case human judgment; therefore, the best tool should seek to amplify human judgment rather than futilely attempting to remove it from the process.

(2) The structure implied by a deserialized git-fast-import stream resembles a line sequence in an editor just enough that the orthogonal ed model of “apply a command verb to a selection” is applicable.

SIMPLICIO: Everything old is new again.

ESR: Everything since those insights has been in some sense mere details. In particular, mating premise (2) to the properties of tne Python Cmd.cmd library class implies quite a lot of the reposurgeon implementation.

But premise (1) suggests a larger question: where else are we making the same mistake? Are there other domains where we should be trying to write judgment amplifiers rather than automatons?

If I ever again write a DSL as effective as reposurgeon it will be because I found a specific answer to that question. I would love to do this again and again and again.

SIMPLICIO: Blog post, or a new chapter for The Art of Unix Programming?

ESR: Hm. Blog post for sure. Not sure premise (2) is Unix-specific enough to deserve a chapter in TAoUP.

SIMPLICIO: I was thinking of “judgment amplifiers vs. automatons”. That demands to be a chapter title. :-)

ESR: It’s a good design question to notice. Whether it’s a Unix design question is another matter.

SIMPLICIO: Seriously, I don’t understand how to know when a DSL is necessary.

ESR: I’m pretty sure “when is it necessary” is the wrong way to frame the question. “When is it possible” would be a better one.

SIMPLICIO: That may be my problem then.

ESR: If you can figure out a proper set of orthogonal primitives to build it around, a DSL is always better than a more rigid design. At worst, it becomes one of the soft layers in an alternating hard and soft stack.

If I have a DSL, I can front-end it as a GUI or some other kind of more rigid interface. But the reverse is not true; if you don’t design in DSL-like flexibility to begin with, it’s almost impossible to retrofit.

SIMPLICIO: That does make sense. In the past I’ve compared DSLs to more general-purpose programming languages and mainly seen their limitations. Now…I’m intrigued.

ESR: A good example is basically any modern E-CAD package. Look past the GUI and you’re going to find some kind of DSL for hardware descriptions underneath. Going directly from the GUI’s data representation to silicon would be doomed, but the soft layer in the middle gives it a way to chunk the design process that captures domain primitives like logic gates, vias, or entire functional blocks.

SIMPLICIO: Oh. Oh! I bet you’re going to bring up SQL next.

ESR: I certainly could. Mathematica, there’s another one.

Yet another example is Emacs. You have to sort of forget that Lisp is theoretically general-purpose for a moment; if you do you’ll see the same hard-over-soft pattern, DSL underpinning something that doesn’t look like one.

This is an extremely powerful way to design. You’d see it more often, but there’s no tradition about teaching the practice. So programmers have to re-invent it nearly from scratch almost every time they do it.

SIMPLICIO: If you know of any good teaching materials, I’d be very grateful. If not, I’ll go googling at some point.

ESR: I wish there had been teaching materials when I was a noob – I had to spend a quarter-century learning how to do it right. Sadly, there still aren’t; all we have is a handful of people who audodidacticated themselves. RMS. Steven Wolfram. Me. A few others.

SIMPLICIO: There are a bunch of things I wish there were teaching materials for. I’ve noticed that if they’re engineering-useful but not interesting to academics they tend not to get written.

ESR: The really hard part is carving the domain operations into orthogonal primitives. Then you sort of clothe them in semi-generic DSL machinery.

So, listen to this carefully: the reason git fast-import streams were essential to the design of reposurgeon is that the concretized the problem. They reduced the abstract question “what is an orthogonal set of primitive operations on repository histories” to a more concrete one: “What is an orthogonal set of primitives for editing the attributed graph implied by a fast-import stream?”

The first question, the abstract one, is fundamentally difficult because it’s ill-defined. The second question, the concrete one, has an answer that may be somewhat complex but is well-defined and not fundamentally difficult. You can get it by staring at diagrams of nodes and links, and thinking up every possible way to screw with them.

SIMPLICIO nods.

ESR: But for a great many interesting cases, any answer to the second question implies an answer to the first. You write your attributed-graph editor and you have a repository editor.

SIMPLICIO: That makes sense :)

ESR: Export/import to actual repositories is still an issue of course but it’s one you can keep well isolated from the rest of the design.

SIMPLICIO: Is there some way to generalize reposurgeon’s design pattern? I think I get it now, but I don’t see how you map it to other application domains.

ESR: A first thing to notice is how the agenda of amplifying human judgment rather than fully automating fits with writing a DSL. You’re not really writing a judgment amplifier if the tool is incapable of doing things the designer didn’t anticipate. You need the flexibility, the ability to generate and collect options.

A second thing is that you can get a hell of a jump on grasping the problem domain well enough to write a DSL over it if there is some kind of declarative markup that captures all of its entities. Then there’s a mapping – mathematicians would call it a functor – between operations on the markup and operations on the problem domain.

So I’d say a good first question to ask is: is there a declarative markup – an analogue of git fast-import streams – that captures everything in the problem domain I want to hack? And if not, can I invent one?

The process of knowledge capture that needs to happen for such a markup to exist is exactly the one that will tell you, or at least imply, what the primitives for your DSL are.

View more on Eric S. Raymond's website »

Like • 0 comments • flag

Published on February 17, 2016 15:39

No comments have been added yet.

Eric S. Raymond's Blog

Eric S. Raymond's profile
141 followers

Eric S. Raymond isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.

delete edit this post