Missing documentation and the reproduction problem

I recently took some criticism over the fact that reposurgeon has no documentation that is an easy introduction for beginners.


After contemplating the undeniable truth of this criticism for a while, I realized that I might have something useful to say about the process and problems of documentation in general – something I didn’t already bring out in How to write narrative documentation. If you haven’t read that yet, doing so before you read the rest of this mini-essay would be a good idea.


“Why doesn’t reposurgeon have easy introductory documentation” would normally have a simple answer: because the author, like all too many programmers, hates writing documentation, has never gotten very good at it, and will evade frantically when under pressure to try. But in my case none of that description is even slightly true. Like Donald Knuth, I consider writing good documentation an integral and enjoyable part of the art of software engineering. If you don’t learn to do it well you are short-changing not just your users but yourself.


So, with all that said, “Why doesn’t reposurgeon have easy introductory documentation” actually becomes a much more interesting question. I knew there was some good reason I’d never tried to write any, but until I read Elijah Newren’s critique I never bothered to analyze for the reason. He incidentally said something very useful by mentioning gdb (the GNU symbolic debugger), and that started me thinking, and now think I understand something general.



If you go looking for gdb intro documentation, you’ll find it’s also pretty terrible. Examples of a few basic commands is all they can do; you never get an entire worked example of using gdb to identify and fix a failure point. And why is this?


The gdb maintainers probably aren’t very self-aware about this, but I think at bottom it’s because the attempt would be futile. Yes, you could include a session capture of someone diagnosing and debugging a simple problem with gdb, but the reader couldn’t reliably reproduce it. How would you the user go about generating a binary on which the replicating the same commands produced the same results?


For an extremely opposite example, consider the documentation for an image editor such as GIMP. It can have excellent documentation precisely because including worked examples that the reader can easily understand and reproduce is almost trivial to arrange.


What’s my implicit premise here? This: High-quality introductory software documentation depends on worked examples that are understandable and reproducible. If your software’s problem domain features serious technical barriers to mounting and stuffing a gallery of reproducible examples, you have a problem that even great willingness and excellent writing skills can’t fix.


Of course my punchline is that reposurgeon has this problem, and arguably an even worse example of it than gdb’s. How would you make a worked example of a repository conversion that is both nontrivial and reproducible? What would that even look like?


In the gdb documentation, you could in theory write a buggy variant of “Hello, World!” with a crash due to null pointer dereference and walk the reader through locating it with gdb. It would be a ritual gesture in the right direction, but essentially useless because the example is too trivial. It would read as a pointless tease.


Similarly, the reposurgeon documentation could include a worked conversion example on a tiny synthetic repository and be no better off than before. In both problem domains reproducibility implies triviality!


Having identified the deep problem, I’d love to be able to say something revelatory and upbeat about how to solve it.


The obvious inversion would be something like this: to improve the quality of your introductory documentation, design your software so that user reproduction of instructive examples is as easy as its problem domain allows.


I understand that this only pushes the boundaries of the problem. It doesn’t tell you what to do when you’re in a problem domain as intrinsically hostile to reproduction of examples as gdb and reposurgeon are.


Unfortunately, at this point I am out of answers. Perhaps the regulars on my blog will come up with some interesting angle.

 •  0 comments  •  flag
Share on Twitter
Published on January 26, 2020 17:57
No comments have been added yet.


Eric S. Raymond's Blog

Eric S. Raymond
Eric S. Raymond isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.
Follow Eric S. Raymond's blog with rss.