C, Python, Go, and the Generalized Greenspun Law

You’re gonna need a bigger Beast The blues ate rock and roll!

C, Python, Go, and the Generalized Greenspun Law

In recent discussion on this blog of the GCC repository transition and reposurgeon, I observed “If I’d been restricted to C, forget it – reposurgeon wouldn’t have happened at all”

I should be more specific about this, since I think the underlying problem is general to a great deal more that the implementation of reposurgeon. It ties back to a lot of recent discussion here of C, Python, Go, and the transition to a post-C world that I think I see happening in systems programming.

(This post perhaps best viewed as a continuation of my three-part series: The long goodbye to C, The big break in computer languages, and Language engineering for great justice.)

I shall start by urging that you must take me seriously when I speak of C’s limitations. I’ve been programming in C for 35 years. Some of my oldest C code is still in wide production use. Speaking from that experience, I say there are some things only a damn fool tries to do in C, or in any other language without automatic memory management (AMM, for the rest of this article).

This is another angle on Greenspun’s Law: “Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.” Anyone who’s been in the trenches long enough gets that Greenspun’s real point is not about C or Fortran or Common Lisp. His maxim could be generalized in a Henry-Spencer-does-Santyana style as this:

“At any sufficient scale, those who do not have automatic memory management in their language are condemned to reinvent it, poorly.”

In other words, there’s a complexity threshold above which lack of AMM becomes intolerable. Lack of it either makes expressive programming in your application domain impossible or sends your defect rate skyrocketing, or both. Usually both.

When you hit that point in a language like C (or C++), your way out is usually to write an ad-hoc layer or a bunch of semi-disconnected little facilities that implement parts of an AMM layer, poorly. Hello, Greenspun’s Law!

It’s not particularly the line count of your source code driving this, but rather the complexity of the data structures it uses internally; I’ll call this its “greenspunity”. Large programs that process data in simple, linear, straight-through ways may evade needing an ad-hoc AMM layer. Smaller ones with gnarlier data management (higher greenspunity) won’t. Anything that has to do – for example – graph theory is doomed to need one (why, hello, there, reposurgeon!)

There’s a trap waiting here. As the greenspunity rises, you are likely to find that more and more of your effort and defect chasing is related to the AMM layer, and proportionally less goes to the application logic. Redoubling your effort, you increasingly miss your aim.

Even when you’re merely at the edge of this trap, your defect rates will be dominated by issues like double-free errors and malloc leaks. This is commonly the case in C/C++ programs of even low greenspunity.

Sometimes you really have no alternative but to be stuck with an ad-hoc AMM layer. Usually you get pinned to this situation because AMM imposes latency costs you can’t afford. The major case of this is operating-system kernels. I could say a lot more about the costs and contortions this forces you to assume, and perhaps I will in a future post, but it’s out of scope for this one.

On the other hand, reposurgeon is representative of a very large class of “systems” programs that don’t have these tight latency constraints. Before I get to back to the implications of not being latency constrained, one last thing – the most important thing – about escalating AMM-layer complexity.

At high enough levels of greenspunity, the effort required to build and maintain your ad-hoc AMM layer becomes a black hole. You can’t actually make any progress on the application domain at all – when you try it’s like being nibbled to death by ducks.

Now consider this prospectively, from the point of view of someone like me who has architect skill. A lot of that skill is being pretty good at visualizing the data flows and structures – and thus estimating the greenspunity – implied by a problem domain. Before you’ve written any code, that is.

If you see the world that way, possible projects will be divided into “Yes, can be done in a language without AMM.” versus “Nope. Nope. Nope. Not a damn fool, it’s a black hole, ain’t nohow going there without AMM.”

This is why I said that if I were restricted to C, reposurgeon would never have happened at all. I wasn’t being hyperbolic – that evaluation comes from a cool and exact sense of how far reposurgeon’s problem domain floats above the greenspunity level where an ad-hoc AMM layer becomes a black hole. I shudder just thinking about it.

Of course, where that black-hole level of ad-hoc AMM complexity is varies by programmer. But, though software is sometimes written by people who are exceptionally good at managing that kind of hair, it then generally has to be maintained by people who are less so…

The really smart people in my audience have already figured out that this is why Ken Thompson, the co-designer of C, put AMM in Go, in spite of the latency issues.

Ken understands something large and simple. Software expands, not just in line count but in greenspunity, to meet hardware capacity and user demand. In languages like C and C++ we are approaching a point of singularity at which typical – not just worst-case – greenspunity is so high that the ad-hoc AMM becomes a black hole, or at best a trap nigh-indistinguishable from one.

Thus, Go. It didn’t have to be Go; I’m not actually being a partisan for that language here. It could have been (say) Ocaml, or any of half a dozen other languages I can think of. The point is the combination of AMM with compiled-code speed is ceasing to be a luxury option; increasingly it will be baseline for getting most kinds of systems work done at all.

Sociologically, this implies an interesting split. Historically the boundary between systems work under hard latency constraints and systems work without it has been blurry and permeable. People on both sides of it coded in C and skillsets were similar. People like me who mostly do out-of-kernel systems work but have code in several different kernels were, if not common, at least not odd outliers.

Increasingly, I think, this will cease being true. Out-of-kernel work will move to Go, or languages in its class. C – or non-AMM languages intended as C successors, like Rust – will keep kernels and real-time firmware, at least for the foreseeable future. Skillsets will diverge.

It’ll be a more fragmented systems-programming world. Oh well; one does what one must, and the tide of rising software complexity is not about to be turned.

View more on Eric S. Raymond's website »

Like • 0 comments • flag

Published on December 18, 2017 14:19

No comments have been added yet.

Eric S. Raymond's Blog

Eric S. Raymond's profile
141 followers

Eric S. Raymond isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.

delete edit this post