Eric S. Raymond's Blog, page 21
December 2, 2016
Some of my blogging has moved
I’ve been pretty quiet lately, other than short posts on G+, because I’ve been grinding hard on NTPsec. We’re coming up on a 1.0 release and, although things are going very well technically, it’s been a shit-ton of work.
One consequence is the NTPsec Project Blog. My first major post there expands on some of the things I’ve written here about stripping crap out of the NTP codebase.
Expect future posts on spinoff tools, the NTPsec test farm, and the prospects for moving NTPsec out of C, probably about one a week. I have a couple of these in draft already.
September 26, 2016
Twenty years after
I just shipped what was probably the silliest and most pointless software release of my career. But hey, it’s the reference implementation of a language and I’m funny that way.
Because I write compilers for fun, I have a standing offer out to reimplement any weird old language for which I am sent a sufficiently detailed softcopy spec. (I had to specify softcopy because scanning and typo-correcting hardcopy is too much work.)
In the quarter-century this offer has been active, I have (re) implemented at least the following: INTERCAL, Michigan Algorithmic Decoder, and a pair of obscure 1960s teaching languages called CORC and CUPL, and an obscure computer-aided-instruction language called Pilot.
Pilot…that one was special. Not in a good way, alas. I don’t know where I bumped into a friend of the language’s implementor, but it was in 1991 when he had just succeeded in getting IEEE to issue a standard for it – IEEE Std 1154-1991. He gave me a copy of the standard.
I should have been clued in by the fact that he also gave me an errata sheet not much shorter than the standard. But the full horror did not come home to me until I sat down and had a good at both documents – and, friends, PILOT’s design was exceeded in awfulness only by the sloppiness and vagueness of its standard. Even after the corrections.
But I had promised to do a reference implementation, and I did. Delivered it to the inventor’s friend. He couldn’t get it to work – some problem with the version of YACC he was using, if I recall correctly. It wasn’t something I could fix remotely, and I left it to him to figure out, being pretty disgusted with the project. I don’t know if he ever did.
I did fix a couple of minor bugs in my masters; I even shipped occasional releases until late 1996. Then…I let the code molder in a corner for twenty years.
But these things have a way of coming back on you. I got a set of fixes recently from one Frank J. Lhota, forward-porting it to use modern Bison and Flex versions. Dear sweet fornicating Goddess, that meant I’d have to…issue another release. Because it’s bad form to let fix patches drop on the floor pour discourager les autres.
So here it is. It does have one point of mild interest; the implementation is both an interpreter and a compiler (it’s a floor wax! It’s a dessert topping!) for the language – that is, it can either interpret the parsed syntax tree or generate and compile corresponding C code.
I devoutly hope I never again implement a language design as botched as Pilot. INTERCAL was supposed to be a joke…
September 24, 2016
Dilemmatizing the NRA
So, the Washington Post publishes yet another bullshit article on gun policy.
In this one, the NRA is charged with racism because it doesn’t leap to defend the right of black men to bear arms without incurring a lethal level of police suspicion.
In a previous blog post, I considered some relevant numbers. At 12% of the population blacks commit 50% of violent index crimes. If you restrict to males in the age range that concentrates criminal behavior, the numbers work out to a black suspect being a a more likely homicidal threat to cops and public safety by at least 26:1.
I haven’t worked out how the conditional probabilities crunch out if you have the prior that your suspect is armed, but it probably makes that 26:1 ratio worse rather than better.
Police who react to a random black male behaving suspiciously who might be in the critical age range as though he is an near-imminent lethal threat, are being rational, not racist. They’re doing what crime statistics and street-level experience train them to do, and they’re right to do it. This was true even before the post-Ferguson wave of deliberate assassinations of police by blacks.
The NRA would, I’m sure, love to defend the RKBA of a black man who isn’t a thug or gangbanger. So would I. The trouble is that when you’re considering police stops nice cases like that are damned thin on the ground.
Seriously, the victims in these stop-and-shoot cases pretty much always turn out to have a history of violent behavior and rap sheets as long as your arm. Often (as in the recent Terence Crutcher case) there is PCP or some other disassociative anaesthetic in their system or in arm’s reach.
It’s hardly any wonder the NRA doesn’t want to spend reputational capital defending the RKBA of these mooks. I wouldn’t either in their shoes; this is not racism, it’s a very rational reluctance to get one’s cause entangled with the scum of the earth.
I cannot help but think that articles like the Post are intended to put the NRA on the horns of a dilemma; remain silent in these cases or be falsely accused of racism, versus speaking up and hearing “Aha! So all that other posturing was bogus, you think it’s just fine for hardened criminals to have guns!”
Sigh…
September 18, 2016
Thinking like a master programmer, redux
Yes, there was a bug in my vint64 encapsulation commit. I will neither confirm nor deny any conjecture that I left it in there deliberately to see who would be sharp enough to spot it. I will however note that it is a perfect tutorial example for how you should spot bugs, and why revisions with a simple and provable relationship to their ancestors are best
The following call in libntp/ntp_calendar.c is incorrect:
setvint64u(res, vint64s(res)-0x80000000);
Now consider the line this replaced:
res.Q_s -= 0x80000000;
And notice what that expands to, semantically, in C:
res.Q_s = res.Q_s – 0x80000000;
Spotted it yet?
My encapsulation patch is extremely regular in form. One of my blog commenters (the only one to spot the bug, so far) pointed out correctly that an ideal transformation of this kind looks like it was done using a text editor search and replace feature – and, in fact, I did most of it with regexp-replace commands in Emacs.
It’s good when your patches are this regular, because it means that you can spot bugs by looking for irregularities – places where a local change breaks a rule followed in the rest of the patch. Importantly, this way to spot defects works even when you don’t fully understand the code.
This is a major reason the code state after every change should have a single provable relationship to its antecedent – because if it has more than one change in it, telltale irregularities will be harder to see.
OK, here is the corrected form of the call:
setvint64u(res, vint64u(res)-0x80000000);
The one-character difference is that the correct inner call is to vint64u(), not vint64s(). You should have been able to spot this in one of a couple of ways.
One is by noticing that the original expression was doing unsigned arithmetic, so what is that call to get a signed value doing in there?
The even simpler way to spot the irregularity is to have noticed that in the rest of the diff there are no other calls like
setvint64X(res, vint64Y(res) … );
in which X and Y are unequal. There is a purely textual symmetry in the patch that this one statement breaks. Because the author was being careful about simplicity and provable relationships, that in itself should be enough to focus a reviewer’s suspicions even if the reviewer doesn’t know (or has forgotten) the meaning of the s and u suffixes.
I’m writing this as though the author and reviewer are different people, but these techniques for bug spotting – and, more importantly, these techniques for writing patches so bugs are easy to spot – apply even when you are your own reviewer and you are looking at a diff mere moments after you changed the code.
You get fast at coding, and you get good at doing it with a low defect rate, by developing the habits of mind that make self-checking like this easy. The faster you can self-check, the faster you can write while holding expected defect rates constant. The better you can self-check, the lower you can push your defect rate.
“To do large code changes correctly, factor them into a series of smaller steps such that each revision has a well-defined and provable relationship to the last” is good advice exactly because the “well-defined and provable” relationship creates regularities – invariants – that make buggy changes relatively easy to spot before you commit them.
I often go entire months per project without committing a bug to the repository. There have been good stretches on NTPsec in which my error rate was down around one introduced bug per quarter while I was coding at apparently breakneck speed. This is how I do that.
Having good tests – and the habit of adding a unit or regression test on every new feature or bug – helps a lot with that, of course. But prior to testing is good habits of mind. The combination of good habits of mind with good testing is not just additively effective, it’s multiplicatively so.
September 17, 2016
Thinking like a master programmer
To do large code changes correctly, factor them into a series of smaller steps such that each revision has a well-defined and provable relationship to the last.
(This is the closest I’ve ever come to a 1-sentence answer to the question “How the fsck do you manage to code with such ridiculously high speed and low defect frequency? I was asked this yet again recently, and trying to translate the general principle into actionable advice has been on my mind. I have two particular NTPsec contributors in mind…)
So here’s a case study, and maybe your chance to catch me in a mistake.
NTP needs a 64-bit scalar type for calendar calculations; what it actually wants is 32 bits of seconds since a far-past epoch and 32 bits of fractional-second precision, which you can think of as a counter for units of seconds * 1e-32. (The details are a little messier than this, but never mind that for now.)
Consequently, one of the archaisms in the NTP code is an internal type called vint64. It dates from the era of 32-bit machines (roughly 1977 to 2008). In those days you couldn’t assume your C compiler had int64_t or uint64_t (64-bit integer and unsigned-integer types). Even after the 64-bit hardware transition, it was some years before you could safely assume that compilers for the remaining 32-bit machines (like today’s Raspberry Pis) would support int64_t/uint64_t.
Thus, a vint64 is an NTP structure wrapping 2 32-bit integers. It comes with a bunch of small functions that do 64-bit scalar arithmetic using it. Also, sadly, there was a lot of code using it that didn’t go through the functional interface, instead exposing the guts of the vint64 structure in unclean ways.
This is, for several reasons, an obvious cleanup target. Today in 2016 we can assume that all compilers of interest to us have 64-bit scalars. In fact the NTP code itself has long assumed this, though the assumption is so well-hidden in the ISC library off to the side that many people who have worked in the main codebase probably do not know it’s there.
If all the vint64s in NTP became typedefs to a scalar 64-bit type, we could use native machine operations in most cases and replace a lot of function calls and ugly exposed guts with C’s arithmetic operators. The result would be more readable, less bulky, and more efficient. In this case we’d only pare away about 300LOC, but relentless pursuit of such small improvements adds up to large ones.
The stupid way to do it would have been to try to go from vint64 to int64_t/uint64_t in one fell swoop. NSF and LF didn’t engage me to be that stupid.
Quoting myself: “A series of smaller steps such that each revision has a well-defined and provable relationship to the last.”
Generally, in cases like this, the thing to do is separate changing the interface from changing the implementation. So:
1. First, encapsulate vint64 into an abstract data type (ADT) with an entirely functional interface – un-expose the guts.
2. Then, change the implementation (struct to scalar), changing the ADT methods without disturbing any of the outside calls to them – if you have to do the latter, you failed step 1 and have to clean up your abstract data type.
3. Finally, hand-expand the function calls to native C scalar operations. Now you no longer have an ADT, but that’s OK; it was scaffolding. You knew you were going to discard it.
The goal is that at each step it should be possible, and relatively easy to eyeball-check that the transformation you did is correct. Helps a lot to have unit tests for the code you’re modifying – then, one of your checks is that the unit tests don’t go sproing at any step. If you don’t have unit tests, write them. They’ll save your fallible ass. The better your unit tests are, the more time and pain you’ll save yourself in the long run.
OK, so here’s you chance to catch me in a mistake.
https://gitlab.com/NTPsec/ntpsec/comm...
That is the diff where I pull all the vint64 guts exposure into a ADT (done with macro calls, not true functions, but that’s a C implementation detail).
Can you find an error in this diff? If you decide not, how did it convince you? What properties of the diff are important?
(Don’t pass over that last question lightly. It’s central.)
If you’re feeling brave, try step 2. Start with ‘typedef uint64_t vint4;’, replacing the structure definition, and rewrite the ten macros near the beginning of the diff. (Hint: you’ll need two sets of them.)
Word to the newbies: this is how it’s done. Train your brain so that you analyze programming this way – mostly smooth sequences of refactoring steps with only occasional crisis points where you add a feature or change an assumption.
When you can do this at a microlevel, with code, you are inhabiting the mindset of a master programmer. When you can do it with larger design elements – data structures and entire code subsystems – you are getting inside system-architect skills.
September 13, 2016
Trials of the Beast
This last week has not been kind to the Great Beast of Malvern. Serenity is restored now, but there was drama and (at the last) some rather explosive humor.
For some time the Beast had been having occasional random flakeouts apparently related to the graphics card. My monitors would go black – machine still running but no video. Some consultation with my Beastly brains trust (Wendell Wilson, Phil Salkie, and John D. Bell) turned up a suitable replacement, a Radeon R360 R7 that was interesting because it can drive three displays (I presently drive two and aim to upgrade).
Last Friday I tried to upgrade to the new card. To say it went badly would be to wallow in understatement. While I was first physically plugging it in, I lost one of the big knurled screws that the Beast’s case uses for securing both cards and case, down behind the power supply. Couldn’t get it to come out of there.
Then I realized that the card needed a PCI-Express power tap and oh shit the card vendor hadn’t provided one.
Much frantic running around to local computer stores ensued, because I did not yet know that Wendell had thoughtfully tucked several spares of the right kind of cable behind the disk drive bays when he built the Beast. Which turns out to matter because though the PCI-E end is standardized, the power supply end is not and they have vendor-idiosyncratic plugs.
Eventually I gave up and tried to put the old card back in. And that’s when the real fun began. I broke the retaining toggle on the graphics card’s slot while trying to haggle the new card out. When I tried to boot the machine with the old card plugged back in, my external UPS squealed – and then nothing. No on-board lights, no post beep, no sign of life at all. I knew what that meant; evidently either the internal PSU or the mobo was roached.
Exhausted, pissed off, and flagellating myself for my apparent utter incompetence, I went to bed. Next morning I called Phil and said “I have a hardware emergency.” Phil, bless him, was over here within a couple of hours with a toolkit and a spare Corsair PSU.
I explained the whole wretched sequence of events including the lost case screw and the snapped retaining clip and the external UPS squealing and the machine going dead, and Phil said “First thing we’ll do is get that case screw out of there.” He then picked up the Beast’s case and began shaking it around at various angles.
And because Phil’s hardware-fu is vastly greater than mine, we heard rattling and saw a screw drop into view in short order. But it was the wrong screw! Not the big knurled job I’d dropped earlier but a smaller one.
“Aha!” says Phil. “That’s a board-mount screw.” Sure enough we quickly found that the southeast corner of the mobo had a bare hole where its securing screw ought to be. I figured out what must have happened almost as soon as Phil did; we gazed at each other with a wild surmise. That screw had either worked itself loose or already been loose due to an assembly error, and it had fallen down behind the motherboard.
Where it had bothered nobody, until sometime during my attempt to change cards I inadvertently jostled it into a new position and that little piece of conductive metal shorted out my fscking motherboard. The big knurled screw (which he shook out a few seconds later) couldn’t have done it – that thing was too large to fit where it could do damage.
Phil being Phil, he had my NZXT PSU out of the case and apart almost as fast as he could mutter “I’m going to void your warranty.” Sure enough, the fuse was blown.
This was good on one level, because it meant the mobo probably wasn’t. And indeed when we dropped in Phil’s Corsair the Beast (and the new card, and its monitors!) powered up just fine. And that was a relief, because the ASUS X99 motherboard is hellaciously more expensive than the PSU.
Almost as much of a relief was the realization that I hadn’t been irredeemably cackhanded and fatally damaged the Beast through sheer fucking incompetence. Hadn’t been for that little screw, all might have gone smoothly.
I also got an informative lecture on why the innards of the PSU looked kinda dodgy. Hotglue in excessive use, components really crowded in, flywiring (that’s when you have wires unsupported in space except by their ends, looping or arcing around components, rather than being properly snugged down).
But Phil was also puzzled. “Why didn’t this thing crowbar?” he wondered. The fuse, you see, is a second-order backup. When a short draws too much power, the PSU is supposed to shut itself down before there’s been time for the fuse to blow.
Phil took it home to replace the fuse with a circuit breaker, leaving the Corsair in the Beast. Which is functioning normally and allowing me to write this blog post, none the worse for wear except for the broken retaining clip.
He texted me this morning. Here’s how it went, effectively verbatim. I feel this truly deserves to be preserved for posterity:
Phil: So, I did a very nice repair job, installing the circuit breaker in the power supply. Then, as I was about to connect it up to my machine, I thought “you know, it really _should_ have crowbarred – not blown a fuse.” So, I powered it up sitting on the floor instead, saying to Ariel “either it’ll work, or it’ll catch fire.”
Phil: So, you may as well hang on to the Corsair power supply – I’ve already picked up a replacement to go in my machine.
Me: Are you telling me it caught fire?
Phil: No, no, no, no, no! Well, a little bit, yes. Just some sparks, and a small amount of smoke – no actual _flames_ as such, not on the outside of the box, at least. Hardly a moment’s trouble, really, once it cooled down enough to toss it in the bin…
Me: The magic smoke got out.
Phil: You might well say that, yes. In fact, all the smoke got out – the magic, the mystic, the mundane, all gone up in, well, smoke, as it were.
Me: “No actual _flames_ as such, not on the outside” Cathy was highly amused. I suspect she was visualizing you behind a blast shield, a la Jamie Hyneman.
Phil: I was trying more for Monty Python’s dead parrot sketch. In retrospect, a blast shield may have been warranted, had I only thought to use one…
Me: What about all those extra cables in the Beast? Are they obsolete now?
Phil: Yes, yes they are. At some convenient point I’ll tidy up all the cabling and make sure you have proper spares for the next time we dare disturb the Beast…
No, I did not make any of this up. Yes, Phil is really like that. And thus endeth the tale of the Trials of the Beast.
August 17, 2016
Some aphorisms on software development methodology
The net benefit of having anything that can be called a software development methodology is in inverse proportion to the quality of your developers – and possibly inverse-squared.
Any software-development methodology works well on sufficiently small projects, but all scale very badly to large ones. The good ones scale slightly less badly.
One thing all methodologies tagged “agile” have in common is that they push developers away from the median. Competent developers work better; mediocre developers don’t change; incompetent ones get worse.
Software metrics have their uses. Unfortunately, their principal use is to create the illusion that you know what’s going on.
Structured development methodologies have their uses, too. Unfortunately, their principal use is to create an illusion of control.
Trust simple, crude metrics over complex ones because the simple ones are less brittle. KLOC is best, though a poor best.
Agile development is efficient only in environments where the cost of flag days is low. Otherwise, slow down, and take time to think and write about your architecture.
Good programmers are difficult to control; great ones are nearly impossible to control. Different methodologies require different kinds and degrees of control; match them to your developers wisely.
Process is not a good substitute for judgment; when you have to use it as one, your project is probably too large. Sometimes this is unavoidable, but don’t fool yourself about what that will cost you.
The difference between O(n**2) and O(n log n) really matters. It’s why lots of small teams working coordinated small projects works better than one big team building a monolith.
A million dollars is roughly a 55-gallon oil drum full of five-dollar bills. Large-scale software development is such a difficult and lossy process that it’s like setting fire to several of these oil drums and hoping a usable product flutters out of the smoke. If this drives you to despair, find a different line of work.
July 22, 2016
Big fish, small pond?
As of tonight, I have a new challenge in my life.
Sifu Dale announced tonight that next year he plans to send a team to next year’s Quoshu, a national-level martial-arts competition held every year in northern Maryland in late summer.
I told Sifu I want to go and compete in weapons, at least. He was like, “Well, of course,” as though he’d been expecting it. Which is maybe a little surprising and flattering considering I’ll be pushing 60 then.
It’ll mean serious training for the next year, and maybe a pretty humiliating experience if it turns out I’m too old and slow. But I want to try, because it’s national-level competition against fighters from dozens of different styles, and…frankly, I’m tired of not having any clear idea how good I am. Winning would be nice, but what I really want is to measure myself against a bigger talent pool.
The thing is, on the limited evidence I have, the possibilities range from “Eric is a clumsy goof who powers through weak opposition just by being a little stronger and more aggressive” to “Eric is a genuinely powerful and clever fighter who even national-level competitors had better take seriously.” It’s really hard for me to tell.
I’ve tended to look pretty good at schools where the style matched my physical capabilities. I was a duffer at aikido and undistinguished at MMA, but me in a style that’s about hitting things or swinging weapons and I shine. You really, really don’t want to be in the way when I strike at full power; I never do it against my training partners because I don’t want to break them. On one occasion at my MMA school when I was practicing short punches against a padded structural beam I vibrated the building. Not kidding!
I also take hits very well when they happen. My sifu often tells new students “Hit him as hard as you can. You can’t hurt him,” which claim is funny because it’s largely true. By the time they can put enough mv**2 on target to make me flinch they’re well beyond being newbies. Generally if he doesn’t say this the student has trained in another striking style before.
On the other hand, I’m only tested against a relatively small population, and it’s not clear that my upper-body strength is the kind of advantage against genuinely skilled opponents that it is when you’re, say, trying to vibrate a building. I’m slow on my feet and my balance is iffy because cerebral palsy. And there are lots of people who can do technique better than me.
If I’m really good, then it’s because (a) I’m strong and tough, (b) I’m aggressive, and (c) I have a kind of low cunning about fighting and do things my opponents don’t expect and aren’t prepared for. I know where my tiger is. An awful lot of people who are better martial technicians than me don’t, and that is a fact.
But I don’t know what percentile this puts me in if you could match me against a hundred people who have also been training for years and are among the best at their schools. In a year and change maybe I will. It’s worth the effort to find out.
July 3, 2016
Count the SKUs
The Washington Post is running a story alleging that surveys show gun ownership in the U.S,. is at a 40-year low. I won’t link to it.
This is at the same time gun sales are at record highs.
The WaPo’s explanation, is, basically, that all these guns are being bought by the same fourteen survivalists in Idaho.
Mine is that the number of gun owners with a justified fear that “surveys” are a data-gathering tool for confiscations is also at a record high, and therefore that the number lying to nosy strangers about having no guns is at a record high.
I think there’s a way to discriminate between these cases on the evidence.
It’s not NICS records, because thoise get destroyed after a timeout. Thankfully…
In any consumer market, a reliable way to tell if it’s broadening or narrowing is whether manufacturers’ and retailers product ranges are expanding or contracting. SKUs are expensive; having more complicates everybodies’ supply chains and planning and accounting.
In a broadening market, the variety of consumer preferences is increasing. It makes sense to chase them with product variations. In a narrowing one the opposite is true, and you shed SKUs that no longer carry the overhead of their differentiation.
In early-stage technologies this effect can be masked by the normal culling of product types that happens as a technology stabilizes. There was much more variety in personal computers in 1980 than there is now! But firearms are not like this; they’re a mature technology.
So a productive question to ask is this: is the huge upswing in gun sales being accompanied by a broadening of product ranges? Google-fu did not provide a definite answer, but I can think of several indicators.
A big one is the explosion in sales of aftermarket parts for AR-15 customization. If that’s a sign of a contracting market, I’ll eat the grips on my Kimber. Another is the way new product classes keep coming out and being difficult to buy until gunmakers tool up to meet demand. The most recent case of this I know of was subcompact (3.5″-barrel) .45ACPs.
Open question for my blog regulars: can we find good public measures for SKU diversity in this space?
June 25, 2016
More scenes from the life of a system architect
Haven’t been blogging for a while because I’ve been deep in coding and HOWTO-writing. Follows the (slightly edited) text of an email I wrote to the NTPsec devel list that I I think might be of interest to a lot of my audience.
One of the questions I get a lot is: How do you do it? And what is “it”, anyway? The question seems like an inquiry into the mental stance that a systems architect has to have to do his job.
So, um, this is it. If you read carefully, I think you’ll learn a fair bit even if you haven’t a clue about NTP itself.
Today, after a false start yesterday and a correction, I completed a patch sequence that makes a significant structural change to NTP that isn’t just removing cruft.
This is kind of a first. Yes, I’ve made some pretty dramatic changes to the code over the last year, but other than the not-yet-successful TESTFRAME scaffolding they were almost all bug fixes, refactorings, or creative removals. The one exception, JSON reporting from ntpdig, was rather trivial.
[What I didn’t say to the list, because they already know it, is that the code was such a rubble pile that it actually took that year to clean up to the point where a change like this was reasonable to attempt.]
What I’ve succeeded in doing is almost completely removing from the code the assumption that refclock addresses necessarily have the special form 127.127.t.u. The only code that still believes this is in the ntp.conf configuration parser, and the only reason *it* still believes this in order not to break the existing syntax of refclock declarations.
(In fact, clock addresses do still have this form internally, but that is only to avoid surprising older ntpq instances; nothing in the NTPsec code now requires it.)
I’ve also made substantial progress towards eliminating driver-type magic numbers from the code. The table that used to indirect from driver-type numbers to driver-type shortnames is gone; instead, the driver shortname string is what it should be – an element of the driver method table – and there is only one type-number-to-driver indirection, a table in refclock_conf.c.
This is all clearing the decks for a big user-visible change. I’m going to fix the frighteningly awful refclock declaration syntax. Consider this example:
# Uses the shared-memory driver, accepting fixes from a running gpsd
# instance watching one PPS-capable GPS. Accepts in-band GPS time (not
# very good, likely to have jitter in the 100s of milliseconds) on one
# unit, and PPS time (almost certainly good to 1 ms or less) on
# another. Prefers the latter.
# GPS Serial data reference (NTP0)
server 127.127.28.0
fudge 127.127.28.0 refid GPS
# GPS PPS reference (NTP1)
server 127.127.28.1 prefer
fudge 127.127.28.1 refid PPS
The misleading “server” keyword for what is actually a reference clock. The magic 127.127.t.u address, which is the only way you *know* it’s a reference clock. Some attributes of the clock being specified in a mystery ‘fudge’ command only tied in by the magic server address. The magic driver type number 28. The fail is strong here. The only excuse for this garbage (and it’s not much of one – Mills was smart enough to know better) is that it was designed decades ago in a more primitive time.
Here’s how I think it should look:
refclock shm unit 0 refid GPS
refclock shm unit 1 prefer refid PPS
No magic IPv4 address, no split syntax, no driver type number (it’s been replaced by the driver shortname “shm”). It should be less work to get the rest of the way to this (while still supporting the old syntax for backward compatibility) than I’ve done already – I’ve already written the grammar, only the glue code still needs doing.
An unobvious benefit of this change is that the driver reference pages are going to become a lot less mystifying. I can still remember how and why my head hurt on first reading them. Removing the magic addresses and mystery numbers will help a lot.
Along the way I learned a lot about how ntpq and mode 6 responses work. (Like NTP in general, it’s an odd combination of elegant structural ideas with an astonishing accumulation of cruft on top.) In order to remove the magic-address assumptions from ntpq I had to add another variable, “displayname”, to the set you get back when you request information about a peer. In effect, ntpd gets to say “*this* is how you should label this peer”, and ntpq uses that to decorate the clock entries in its -p output.
This has the minor downside that new ntpqs will display 127.127.28.0 (rather than “SHM(0)”) when querying Classic ntpd, which doesn’t ship that variable. Oh well…almost everyone disables remote querying anyway. It was the right thing to do; ntpq has no business knowing about driver type numbers.
(Grrrrr…Actually, *nobody* has any business knowing about driver type numbers. Things that have names should be referred to by name. Making humans maintain a level of indirection from names to numbers is perverse, that’s the kind of detail we have computers to track. Or, to put it slightly differently, “1977 called – it wants its ugly kluge back.”)
It’s easy for codebases this size to wind up as huge balls of mud. There are several nearly equivalent ways to describe my job as a systems architect; one of them centers on enforcing proper separation of concerns so collapse-to- mudball is prevented. The changes I’ve just described are a significant step in the good direction.
Eric S. Raymond's Blog
- Eric S. Raymond's profile
- 140 followers
