Eric S. Raymond's Blog, page 33

October 31, 2014

Cognitive disinhibition: not the whole story of genius

Here’s an interesting article with a stupid and misleading title on the role of what the author calls “cognitive disinhibition” – a fancy term for “allowing oneself to notice what others miss” – in enabling creative genius.


While in many ways I could be a poster child for Simonton’s thesis (and I’ll get to those) I also think there are some important things missing from his discussion, which is why I’m blogging about it. The most crucial problem is that his category of “madness” is not sharp enough. I know how to fine it down in a way that I think sheds considerable light on what he is trying to analyze.



First let’s get through the easy stuff. Go read the article. It’s short.


I was especially struck, in a positive way, by Simonton’s discussion of childhood factors that promote the ability to disinhibit cognition. He mentions “bilingualism” – ding ding ding, been there and done that and I have always thought it helped free me from over-dependence on fixed linguistic categories. It’s easier to bear in mind that the map is not the territory when you have two different maps.


Also, under “various forms of developmental adversity, such as parental loss, economic hardship, and minority status”, yeah, I think congenital cerebral palsy qualifies not just as developmental adversity but as the right kind. It’s not news to anyone who has studied or dealt with CP kids that they are disproportionately gifted and bright. Rage against the limitations of the body can lead to a sharper mind.


More things he gets right: I think it is correct that geniuses are distingished from madmen in part by higher general intelligence. The brightest people I have known are hyper-sane. I can be precise about that: elsewhere I have defined sanity as the process by which you continually adjust your beliefs so they are predictively sound. Extremely intelligent people tend to be extremely good at this; it’s the half-bright and merely gifted that are more likely, in my experience, to be unsane or insane (that is, poor at maintaining predictive beliefs).


Now we get to what he’s missed. First, Simonton writes as though he believes that only cognitive disinhibition can produce genius-level conceptual breakthrough. I think this is mistaken; there’s an alternate path through plain hard work, climbing the mountain foot by foot rather than teleporting to the peak in a flash of lightning Zen insight.


For an instructive study in the contrast I like to cite two instances from the history of chemistry: Kekulé’s discovery of the benzene ring versus the elucidation of the double-helix structure of DNA. Kekule’s breakthrough was a sudden insight, an eruption from his unsconscious. By contrast, there does not seem to have been any large aha moment in the discovery of the structure of DNA. That was done by painstaking collection of data, meticulous analysis, and the gradual elimination of competing possibilities.


There’s a kind of romanticism in many people that wants to see genius only in the sudden lightning flashes. I think I know better; I have a lot more lightning Zen insights than most people but even for me they’re comparatively rare. Most of the time, when I pull off something that looks like creative genius it’s because I’ve worked very hard at getting up the mountain.


Matters are confused by the fact that the kind of immersive effort that gets you “genius” by hard work is also an important enabler, a setup condition, for the lightning Zen insight. But having experienced both I believe they are different processes. Possibly related to the distinction between “theory one” and “theory two” thinking that’s fashionable lately.


Most importantly, I think Simonton’s category of “madness” is de-focused in a way that harms his thesis. The most important truth about human psychology that I have learned in many years is that psychosis (which is what Simonton is identifying as madness) is a very specific thing, not merely “cognitive disinhibition” but a loss of the ability to maintain an integrated sense of the self. As I have put it elsewhere, the delusional psychotic frantically spins his theory-building wheels because he cannot identify the fragments of his disassociated mentation as “self” and must therefore attribute them to external agency.


Armed with that insight, I think we can improve on Simonton’s thesis in a major way. I propose that cognitive disinhibition is not a primary feature of madness but a secondary effect of the dissassociation of the self, the society of mind cracking up into Babel. Conversely, the key trait that distinguishes functional geniuses (especially the cohort in the hard sciences that Simonton notes are unusually sane) is the combination of cognitive disinhibition with an exceptionally well-developed ability to distinguish self from other in perception – anti-insanity, as it were.


Again, this is partly informed by my own experience. I have always had a very firm grasp on who I am. It wasn’t until well into adulthood that I realized that “identity crisis” isn’t just a literary conceit or self-indulgent silliness. People really get those! I sort of understand that now, intellectually, but the possibility of having an “identity crisis” myself…no. It would take drugs or brain damage to do that to me.


I think there are hints about the neurology involved from the study of how the brain acts during meditation. It’s been found (can’t track down a source, alas) that some kind of meditation temporarily shut down portions of the right parietal lobe that are responsible for maintaining our representation of the physical self-other distinction. The meditator feels “one with everything” for exactly as long as the wiring that tells him he isn’t remains switched off. This isn’t quite like madness – the meditator’s sense of self is unitary and extended rather than fragmented – but I think it is instructively similar.


All this has some functional implications. It tells us we needn’t fear cognitive disinhibition in itself – it’s not causative of madness, it’s a near-accidental side effect of same. What it means for how we cultivate more genius is less clear. Probably the best strategy would be a combination of intelligence-enhancing nootropics with training to enhance the self-other distinction, if we had any real clue how to do the latter.

 •  0 comments  •  flag
Share on Twitter
Published on October 31, 2014 05:45

October 29, 2014

When hackers grow old

Lately I’ve been wrestling with various members of an ancient and venerable open-source development group which I am not going to name, though people who regularly follow my adventures will probably guess which one it is by the time I’m done venting.


Why it so freaking hard to drag some people into the 21st century? Sigh…


I’m almost 56, an age at which a lot of younger people expect me to issue semi-regular salvos of get-off-my-lawn ranting at them. But no – I find, that, especially in technical contexts, I am far more likely to become impatient with my age peers.


A lot of them really have become grouchy, hidebound old farts. And, alas, it not infrequently falls to me to be the person who barges in and points out that practices well-adapted for 1995 (or, in the particular case I’m thinking of, 1985) are … not good things to hold on to decades later.


Why me? Because the kids have little or no cred with a lot of my age peers. If anyone’s going to get them to change, it has to be someone who is their peer in their own perception. Even so, I spend a lot more time than seems just or right fighting inertia.


Young people can be forgiven for lacking a clue. They’re young. Young means little experience, which often leads to unsound judgment. It’s more difficult for me to forgive people who have been around the track often enough that they should have a clue, but are so attached to The Way It’s Always Been Done that they can’t see what is in front of their freaking noses.



(News flash: I really don’t have a conservative temperament. I find it wryly amusing how often both conservatives and non-conservatives who argue politics with me fail to notice this.)


OK, now let’s talk about GNU ChangeLog files. They were a fine idea, a necessary one even, in 1985. The idea was to use a single ChangeLog entry to document a group of related changes to multiple files. This was a reasonable adaptation to absent or extremely primitive version control. I know this because I was there.


Even in 1995, or as late as the early 2000s, many version control systems didn’t have changesets. That is, there was no or only weak support for grouping multiple file modifications into a single retrievable object with a comment attached to the object rather than to individual file modifications. CVS, the system in widest use then, only faked changesets – and did it so badly that many people felt they couldn’t rely on that feature. ChangeLog files still made some functional sense.


But then Subversion – with real changesets – achieved wide acceptance through its beta releases around 2003 and its 1.0 in 2004. It should have been obvious then, even before the new wave of DVCSes that began a year later, that there was a culture clash a comin’. Because if your project both has a DVCS and uses the ChangeLog convention, they’re fighting for control of the same metadata.


There are different ways you can adapt. One is to continue to treat the ChangeLogs as the authoritative record of the evolution of the code. In that case, you tend to get stubby or pro-forma commit comments.


Another is to treat the commit comment log as authoritative. If you do that, you soon begin to wonder why you’re still writing ChangeLog entries at all. The commit metadata has better coherence with the code changes, after all – that’s what it’s designed for.


(Now imagine a project in which, with the best of intentions, different people are making opposite choices out of these two. Now you have to read both the ChangeLogs and the commit logs to know what’s going on. Friction costs are rising…)


A third is to try to have it both ways – duplicating commit comment data in a slightly different format in a ChangeLog entry that’s part of the commit. This has all the problems you’d expect with a representation in which there is no single point of truth; one copy gets garbled, or the ChangeLog entry gets modified so that it’s no longer in sync with the allegedly matching commit data, and life gets very confusing for anyone who comes along later and tries to figure out what people were thinking.


Or, as a senior dev on a Certain Project I Won’t Name just did in email, declaring that commits can include multiple ChangeLog entries and the commit metadata is irrelevant to the Changelogs. Which we still have to write.


My eyes crossed and my gorge rose when I read that. What kind of fool fails to realize that this is begging for trouble – that, actually, the whole edifice of custom around ChangeLog files is just dead weight and friction drag in a DVCS world with good browsing tools for reliable commit logs?


Alas, it’s a very particular kind of fool: a hacker who has grown old and rigid. All the rationalizations he will ever utter fail to hide this. He’s attached to tactics that made sense a decade ago but have become counterproductive ceremonies now. If you tried to explain not just about git summary lines but that the correct adaptation for current toolsets is to scrap ChangeLogs entirely … well, that would be insupportable, inconceivable, and just crazy talk.


Functionally this infuriates me. It is substantially harder to work on that project because of this and related nonsense. And, as badly as it happens to need young developers, that’s a real problem. It has a G+ community well into 4 digits, they’re mostly kids, and they’re not stepping up. Evidently the message has been received on the outside; the devs on this project are ancient mossbacks with inexplicable tribal fixations, and best admired from a good long distance.


What gives this extra emotional edge for me is that whenever I have to butt heads with a mossback, I keep wondering: will I be like this someday? Worse, am I looking in a mirror, already rigidified and not knowing it? I mean, I get the impression from his web presence that this particular specimen is younger than me. By a good fifteen years.


I feel mentally agile. I don’t get frustrated by people moving faster than I can handle, I get frustrated by people who can’t keep up with me, who can’t see the obvious. But this self-belief could be just a bad case of Dunning-Krueger effect biting me where I least understand it. Very few things terrify me; this possibility is high on the short list.


A separately disconcerting thing is that as I get older this sort of collision is happening more often rather than less. Somehow I expected my hacker peers to age more gracefully, to retain their neotenous flexibility even if they were physically aging. Some do indeed seem to be going that way; too many, alas, are not. It is a sadness.


I’m not sure I have a good finish for this. If I’ve escaped mentally rigidifying (and that’s an if) I think I know at least in part why, but I’m very unsure whether it can be generally replicated – you might need to have a wired-in brain chemistry that matches the strategy. Nevertheless, for whatever it’s worth, here is my advice to young hackers and indeed the young of all kinds.


You – yes, even you – cannot count on retaining your mental flexibility into middle and old unless you work at it. You have to practice busting out of comfortable mental grooves and regularly checking your assumptions when you’re young, and you have to develop a habit of it that sustains into old age.


It’s said that the best time for a middle-aged person to start (physically) exercising is thirty years ago. I think the same goes for the habits that might (might!) keep you mentally agile at 56, or 65. Push your envelope. Develop the regular practice of challenging yourself and exiting your comfort zone now so you’ll have it established when you really need it.


You have to be realistic about this; there’s an optimal-challenge level where you choose an attainable goal and work mentally hard for it. This month I’m going to learn go. Not the game, I already play that (though not very well); the programming language. Not because I really need to for a specific project, but because it’s time to stretch myself.


Develop that habit. And never let it go.

 •  0 comments  •  flag
Share on Twitter
Published on October 29, 2014 10:14

October 24, 2014

Moving the NetBSD repository

Some people on the NetBSD tech-repository list have wondered why I’ve been working on a full NetBSD repository conversion without a formal request from NetBSD’s maintainers that I do so.


It’s a fair question. An answer to it involves both historical contingency and some general issues about moving and mirroring large repositories. Because of the accident that a lot of people have recently dropped money on me in part to support an attack on this problem, I’m going to explain both in public.



First, the historically contingent part:


1. Alan Barrett tried to run a full conversion of NetBSD using cvs-fast-export last December and failed (OOM). He then engaged me and we spent significant effort trying to reduce the program’s working set, but could not prevent OOM on either of the machines we were using. Because Alan was willing to work on this at some length, I formed the idea that there was real demand for a full NetBSD conversion.


2. The NetBSD repo is large and old. I wanted a worst-possible-case (or near worst-possible-case) to test the correctness of the tool on. I knew there might be larger repositories out there (and now it appears that Gentoo’s is one such) but for obvious historical reasons I thought NetBSD would be an exemplary near-worst case. Thus, it would be a worthy test even if the politics to get the result deployed didn’t pan out.


I have since been told that NetBSD actually has a git mirror of its CVS repository produced with a two-step conversion: CVS -> Fossil -> git.


This makes me nervous about the quality of the result. Repo conversions produce artifacts due to ontological mismatches between the source and target systems; a two-stage process will compound the problems. Which in turn gives rise exactly the kinds of landmines one least wants – not obvious on first inspection but chronically friction-causing down the road.


I’m not speaking theoretically about this; I’m currently dealing with a major case of landmine-itis in the Emacs repository, which has (coincidentally) just been scheduled for a full switch to git on Nov 11. I’ve been working on that conversion for most of a year.


For a really high-quality conversion even a clean single-stage move needs human attention and polishing. This is why reposurgeon is designed to amplify the judgment of a human operator rather than attempt to fully mechanize the conversion.


I understand there is internal controversy within NetBSD over a full switch to git. I don’t really want to get entangled in the political part of the discussion. However, as a technical expert on repository conversions and their problems, I urge the NetBSD team to move the base repository to something with real changesets as soon as possible.


It doesn’t have to be git. Mercurial would do; even Subversion would do, though I don’t recommend it. I’m not grinding an axe for git here, I’m telling you that the most serious, crazy-making traps for the unwary lie in the move from a version-control system without full coherent changesets to a VCS with one. Once you have that conversion done and clean, moving the repository content to any other such system is relatively easy.


(Again, I’m not speaking theoretically – reposurgeon is the exact tool you want for such cross-conversions.)


This is my offer: I have the tools and the experience to get you to the changeset-oriented VCS of your choice. I can do a really good job, better than you’ll ever get from mechanical mirroring or a batch converter, because I know all about common conversion artifacts and how to do things like lifting old version references and ignore-pattern files.


It looks like my tools are git-oriented because they rely on git fast-import streams as an interchange format, but I’m not advocating git per se – I’m urging you to move somewhere with changesets. It’s a messy job and it wants an expert like me on it, but it only has to be done once. Afterwards, the quality of your developer experience and your future technical options with regard to what VCS you actually want to use will both greatly improve.


Related technical point: the architectural insight behind my tools is that the git folks created something more generally useful than they understood when they defined import streams. Having an editable transfer format that can be used to move content and metadata relatively seamlessly between VCSes is as important in the long term as the invention of the DVCS – possibly more so.


cvs-fast-export emits a fast-import stream not because I’m a git partisan (I actually rather wish hg had won the mindshare war) but because that’s how you get to a sufficiently expressive interchange format.


I’ll mail this to tech-repository once I can find out how to sign up.

 •  0 comments  •  flag
Share on Twitter
Published on October 24, 2014 03:18

October 21, 2014

Proving the Great Beast concept

Wendell Wilson over at TekSyndicate had a good idea – run the NetBSD repo conversion on a machine roughly comparable to the Great Beast design. The objective was (a) to find out if it freakin’ worked, and (b) to get a handle on expected conversion time and maximum working set for a really large conversion.


The news is pretty happy on all fronts.



The test was run on a dual Xeon 2660v3 10 core, 256GB memory and a 3.0GHz clock, with conventional HDs. This is less unlike the Great Beast design than it might appear for benchmarking purposes, because the core algorithms in cvs-fast-export can’t use all those extra cores very effectively.


The conversion took 6 hours and 18 minutes of wall time. The maximum working set was about 19.2GB. To put this in perspective, the repo is about 11GB large with upwards of 286,000 CVS commits. The resulting git import stream dump is 37GB long.


The longest single phase was the branch merge, which took about 4.4 hours. I’ve never seen computation dominate I/O time (just shy of two hours) before – and don’t think I will again except on exceptionally huge repositories.


A major reason I needed to know this is to get a handle on whether I should push micro-optimizations any further. According to profiling the best gains I can get that way are about 1.5%. Which would have been about 5 minutes.


This means software speed tuning is done, unless I find an algorithmic improvement with much higher percentage gains.


There’s also no longer much point in trying to reduce maximum working set. Knowing conversions this size will fit comfortably into 32GB of RAM is good enough!


We have just learned two important things. One: for repos this size, a machine in the Great Beast class is sufficient. For various reasons, including the difficulty of finding processors with dramatically better single-thread performance than a 3GHz 2660v3, these numbers tell us that five hours is about as fast as we can expect a conversion this size to get. The I/O time could be cut to nearly nothing by doing all the work on an SSD, but that 4.4 hours will be much more difficult to reduce.


The other thing is that a machine in the Great Beast class is necessary. The 18GB maximum working set is large enough to leave me in no doubt that memory access time dominated computation time – which means yes, you actually do want a design seriously optimized for RAM speed and cache width.


This seems to prove the Great Beast’s design concept about as thoroughly as one could ask for. Which means (a) all that money people dropped on me is going to get used in a way well matched to the problem, neither underinvesting nor overinvesting, and (b) I just won big at whole-systems engineering. This makes me a happy Eric.

 •  0 comments  •  flag
Share on Twitter
Published on October 21, 2014 21:47

October 20, 2014

Building the perfect beast

I’ve attempted to summarize the discussion of build options for the repository-surgery machine. You should see a link at the top of the page: if not, it’s here


I invite all the commenters who have shown an interest to critique these build proposals. Naturally, I’d like to make sure we have a solid parts list with no spec conflicts before we start spending money and time to build this thing.



As the Help Stamp Out CVS In Your Lifetime fund has received $965 and I said I’d match that, even the Xeon proposal is within reach. Though I don’t mind admitting that I wasn’t expecting to have to match quite so much generosity and the thought of spending $900 on the machine makes me swallow a bit hard. If the gentleman who instigated the Xeon proposal is still willing to toss a couple of bitcoins at it, I won’t be too proud to accept.


Plans are not yet final, but John Bell (who started this party with his $100 “Get a real computer, kid!” donation) says he’s eager to do the build at his place in Toledo. Then he’ll haul it out here and we’ll do final installation and system qualification, probably sometime in mid-November.


I’ve already had one nomination for the next CVS mammoth to get speared: Gentoo. I’ve sent an offer but seen no response yet. NetBSD is definitely on my list. I’ll cheerfully accept suggestions of other deserving targets.


Not to forget that I do Subversion repositories too. I’ve actually converted more of those than CVS ones, I think – Battle For Wesnoth, Hercules, Roundup, and Network Utility Tools are the ones that leap to mind.

 •  0 comments  •  flag
Share on Twitter
Published on October 20, 2014 08:55

October 18, 2014

Black magic and the Great Beast

Something of significance to the design discussion for the Great Beast occurred today.


I have finally – finally! – achieved significant insight into the core merge code, the “black magic” section of cvs-fast-export. If you look in merge.c in the repo head version you’ll see a bunch of detailed comments that weren’t there before. I feel rather as Speke and Burton must have when after weeks of hacking their way through the torrid jungles of darkest Africa they finally glimpsed the source of the Nile…



(And yes, that code has moved. It used to be in the revlist.c file, but that now contains revision-list utility code used by both stages 1 and 2. The black magic has moved to merge.c and is now somewhat better isolated from the rest of the code.)


I don’t grok all of it yet – there’s some pretty hairy and frightening stuff happening around branch joins, and my comprehension of edge cases is incomplete. But I have figured out enough of it to have a much better feel than I did even a few days ago for how it scales up.


In particular I’m now pretty sure that the NetBSD attempt did not fail due to an O(n**2)/O(n**3) blowup in time or space. I think it was just what it looked like, straight-up memory exhaustion because the generated gitspace commits wouldn’t fit in 4GB. Overall scaling for computational part (as opposed to I/O) looks to me like it’s roughly:


* O(m**2) in time, with m more related to maximum revisions per CVS master and number of branches than total repo or metadata volume.


* O(n) in space, where in is total metadata volume. The thing is, n is much larger than m!


This has implications for the design of the Great Beast. To match the implied job load, yes, serial computation speed is important, but the power to rapidly modify data structures of more than 4GB extent even more so. I think this supports the camp that’s been arguing hard for prioritizing RAM and cache performance over clock speed. (I was leaning that way anyway.)


My estimate of O(n) spatial scaling also makes me relatively optimistic about the utility of throwing a metric buttload of RAM at the problem. I think one of the next things I’m going to do is write an option that returns stats on memory usage after stages 1 and 2, run it on several repos, and see if I can curve-fit a formula that predicts the stage 2 figure given Stage 1 usage.


Even without that, I think we can be pretty confident that the NetBSD conversion won’t break 32GB; the entire repo content is 11GB, so the metadata has to be significantly smaller than that. If I understand the algorithms correctly (and I think I do, now, to the required degree) we basically have to be able to hold the equivalent of two copies of the metadata in memory.


(In case it’s not obvious, I’m using NetBSD as a torture test because I believe it represents a near worst case in complexity.)


I’m also going to continue working on shrinking the memory footprint. I’ve implemented a kind of slab allocation for the three most numerous object classes, cutting malloc overhead. More may be possible in that direction.


So, where this comes out is I’m now favoring a design sketch around 1.35V ECC RAM and whichever of the Xeons has the best expected RAM cache performance, even if that means sacrificing some clock speed.

 •  0 comments  •  flag
Share on Twitter
Published on October 18, 2014 16:11

October 17, 2014

Spending the “Help Stamp Out CVS In Your Lifetime” fund

I just shipped cvs-fast-export 1.21 much improved and immensely faster than it was two weeks ago. Thus ends one of the most intense sieges of down-and-dirty frenzied hacking that I’ve enjoyed in years.


Now it comes time to think about what to do with the Help Stamp Out CVS In Your Lifetime fund, which started with John D. Bell snarking epically about my (admittedly) rather antiquated desktop machine and mushroomed into an unexpected pile of donations.


I said I intend to use this machine wandering around the net and hunting CVS repositories to extinction, and I meant it. If not for the demands of the large data sets this involves (like the 11 gigabytes of NetBSD CVS I just rsynced) I could have poked along with my existing machine for a good while longer.


For several reasons, including wanting those who generously donated to be in on the fun, I’m now going to open a discussion on how to best spend that money. A&D regular Susan Sons (aka HedgeMage) built herself a super-powerful machine this last February, and I think her hardware configuration is sound in essentials, so that build (“Tyro”) will be a starting point. But that was eight months ago – it might be some of the choices could be improved now, and if so I trust the regulars here will have clues to that.


I’ll start by talking about design goals and budget.



First I’ll point at some of my priorities:


* Serious crunching power for surgery on large repositories. The full Emacs conversion runs I’ve been doing take eight hours – goal #1 is to reduce that kind of friction.


* High reliability for a long time. I’d rather have stable than showy.


* Minimized noise and vibration.


Now some anti-priorities: Not interested in overclocking, not interested in fancy gamer cases with superfluous LEDs and Lambo vents, fuck all that noise. I’m not even particularly interested in 3D graphics. Don’t need to buy a keyboard or mouse or speakers and I have a dual-port graphics card I intend to keep using.


Budget: There’s $710 in the Help Stamp Out CVS In Your Lifetime fund. I’m willing to match that, so the ceiling is $1420. The objective here isn’t really economy, it’s power and buying parts that will last a long time. It’d be nice to go four or five years without another upgrade.


OK, with those points clear, let’s look at some hardware.


First, this case from NZXT. How do I love thee? Let me count the ways: 200mm low-velocity case fans for minimal noise, toolless assembly/disassembly, no sharp edges on the insides (oh boy do my too-frequently-skinned knuckles like that idea). USB and speaker ports mounted near the top right corner so they’ll be convenient to reach when it sits on the floor on the left side of my desk. Removable cleanable filters in the air vents.


To anyone who’s ever tinkered with PCs and cursed the thoughtless, ugly design of most cases, the interior images of this thing are sheer porn. Over on G+ someone pointed me at a boutique case design from Sweden called a Define R4 that moves in the same direction, but this goes further. And I want those 200mm fans badly – larger diameter means they can move enough air with a lower turning rate, which means less noise generated at the rotor tips.


Doubtless some of you are going to want to talk up Antec and Lian Li cases. Not without reason; I’ve built systems into Antecs and know Lian Li by reputation. But the NZXT (and the Define R4) go to a level of thoughtfulness in design that I’ve never seen before. (In truth, the way they’re marketed suggests that this is what happens when people who design gamer cases grow up and get serious.) Suggest alternatives if you like, but be aware that I will almost certainly consider not being able to mount those 200mm fans a dealbreaker.


Processor: AMD FX-8350 8-Core 4.0GHz. The main goal here is raw serial-processing power. Repository surgery generally doesn’t parallelize well; it turned out that multithreading wasn’t a significant win for cvs-fast export (though the code changes I made to support it turned out to be a very good thing).


So high clock speed is a big deal, but I want stable performance and reliability. That means I’d much rather pay extra for a higher rated speed on a chip with a locked clock than go anywhere near the overclocking thing. I would consider an Intel chip of similar or greater rated clock speed, like one of the new Haswells. Of course that would require a change in motherboard.


Speaking of motherboards: Tyro uses an MSI 990FXA-GD80. Susan says this is actually a gamer board but (a) that’s OK, the superfluous blinkenlights are hidden by the case walls, and (b) having it designed for overclocking is good because it means the power management and performance at its rated speed are rock solid. OK, so maybe market pressure from the gamers isn’t so bad in this instance.


RAM: DDR3 2133. 2133 is high speed even today; I think the job load I’m going to put on this thing, which involves massive data shuffling, well justifies a premium buy here.


Susan recommends the Seagate SV35 as a main (spinning) drive – 3TB, 8.5msec seek time. It’s an interesting call, selected for high long-term reliability rather than bleeding-edge speed on the assumption that an SSD will be handling the fast traffic. I approve of that choice of priorities but wonder if going for something in the Constellation line might be a way to push them further.


Susan recommends an Intel 530 120GB SSD, commenting “only buy Intel SSDs, they don’t suck”. I’m thinking its 480GB big brother might be a better choice.


Susan says “Cheap, reliable optical drive”; these days they’re all pretty good.


The PSU Tyro used has been discontinued; open to suggestions on that one.


Here’s how it prices out as described: NZXT = $191.97, mobo $169.09, CPU $179.99, 32GB RAM = 2x $169.99, SSD = $79.99, HDD = $130.00. Total system cost $1092.02 without PSU. Well under my ceiling, so there’s room for an upgrade of the SSD or more RAM.


Let the optimization begin…


UPDATE: The SeaSonic SS-750KM3 is looking good as a PSU candidate – I’m told it doesn’t even turn on its fan at under 30% load. At $139.99 that brings the bill to $1232.01.

 •  0 comments  •  flag
Share on Twitter
Published on October 17, 2014 09:24

October 16, 2014

A low-performance mystery: Sometimes you gotta simplify

This series of posts is increasingly misnamed, as there is not much mystery left about cvs-fast-export’s performance issues and it is now blazingly, screamingly, bat-out-of-hell fast. As in both threaded and unthreaded version convert the entire history of groff (15593 CVS deltas in 1549 files in 13 seconds flat. That would be about 10K CVS commits per minute, sustained; in practice the throughput will probably fall off a bit on very large repositories.


I achieved the latest doubling in speed by not succumbing to the temptation to overengineer – a trap that lays in wait for all clever hackers. Case study follows.



To review, for each master there’s a generation loop that runs to produce all its revision snapshots. Because CVS stores deltas in reverse (that is, the tip node contains the entire most recent revision, with the deltas composing backward to an empty file) the snapshots are emitted in reverse order.


These snapshots are then stashed in a temp directory to be picked up and copied out in the correct (canonical git-fast-export order) – forward, and just in time for the commits that reference them.


The reason for this generate-then-copy sequence (which doubles the program’s I/O traffic) was originally twofold. First, I wanted the output streams to be look as much as possible like what git-fast-import would ship from an equivalent git repository. Second, if you’re going to make incremental dumping work (give me a stream of all commits since time T) you must use this canonical order. For some people this is a must-have feature.


Then, when I added multithreading, the temp files achieved a different utility. They were a place for worker threads to drop snapshots without stepping on each other.


When I last posted, I was preparing to attempt a rather complicated change in the code. To get rid of the temp files, but preserve canonical ordering, my plan was to pick apart that generation loop and write a function that would just give me snapshot N, where N is the number of revisions from base. I could then use this to generate blobs on demand, without I/O traffic.


In preparation for this change, I separated snapshot generation from master analysis and moved it into stage 3, just before export-stream generation. When I did this, and profiled, I noticed a couple of things.


(1) The analysis phase became blisteringly fast. To the point where it could parse and analyze 15,000 CVS masters in less than a second. The output traffic to write the snapshots had been completely dominating not just the analysis computation but the input cost to read the masters as well.


(2) Snapshot writes were no longer threaded – and that didn’t make a damn bit of difference to the throughput. Snapshot generation – the most compute-intensive part of the program – was also completely dominated by I/O time. So the utility of the temp files began to look at best questionable.


(3) Threading stopped making any noticeable difference in throughput, either positive or negative.


Reality was trying to tell me something. The something was this: forget being clever about threading and incremental blob generation in core. It’s too complicated. All you need to do is cut the snapshot I/O traffic. Ditch the canonical dump order and ship the snapshots as they’re made – never do a copy.


Keep it simple, stupid!


That is what I did. I couldn’t give up on canonical order entirely; it’s still needed for incremental dumping, and it’s handy for testing. But the tool now has the following behavior:


* Below a certain size threshold (byte volume of master files) it defaults to dumping in canonical order, with temp file copies.


* Above that size, it dumps in fast order (all blobs first), no copying.


* There are -C and -F command-line option to force the dump style.


The threshold size is set so that canonical order is only defaulted to when the resulting dump will take almost no time even with the copies.


The groff repo is above the threshold size. The robotfindskitten repo is below it. So are my regression-test repos. Yes, I did add a regression test to check that canonical-order and fast-order conversions are equivalent!


And I think that brings this saga nearly to a close. There is one more optimization I might try – the bounded-queue-with-writer-thread thing some of my regulars suggested. I frankly doubt it’ll make a lot of difference; I’ll probably implement it, profile to show that, and then remove it to keep the code simple.


This does not, however, mean that the bucks people threw at the Help Stamp Out CVS In Your Lifetime fund were misdirected. I’m going to take the combination of cvs-fast-export and a fast engine to run it on in hand, and use it. I shall wander the net, hunting down and killing converting the last CVS repositories. Unlike hunting mammoths to extinction, this will actually be a good thing.

 •  0 comments  •  flag
Share on Twitter
Published on October 16, 2014 08:53

October 14, 2014

A low-performance mystery: the adventure continues

The mystery I described two posts back has actually been mostly solved (I think) but I’m having a great deal of fun trying to make cvs-fast-export run even faster, and my regulars are not only kibitzing with glee but have even thrown money at me so I can upgrade my PC and run tests on a machine that doesn’t resemble (as one of them put it) a slack-jawed yokel at a hot-dog-eating contest.


Hey, a 2.66Ghz Intel Core 2 Duo with 4GB was hot shit when I bought it, and because I avoid bloatware (my window manager is i3) it has been sufficient unto my needs up to now. I’m a cheap bastard when it comes to hardware; tend to hold onto it until I actually need to upgrade. This is me loftily ignoring the snarking from the peanut gallery, except from the people who actually donated money to the Help Stamp Out CVS In Your Lifetime hardware fund.


(For the rest of you, the PayPal and Gratipay buttons should be clearly visible to your immediate right. Just sayin’…)


Ahem. Where was I? Yes. The major mystery – the unexplained slowdown in stage 3 of the threaded version – appears to have been solved. It appears this was due to a glibc feature, which is that if you link with threads support it tries to detect use of threads and use thread locks in stdio to make it safe. Which slows it down.



A workaround exists and has been applied. With that in place the threaded performance numbers are now roughly comparable to the unthreaded ones – a bit slower but nothing that isn’t readily explicable by normal cache- and disk-contention issues on hardware that is admittedly weak for this job. Sometime soon I’ll upgrade to some beastly hexacore monster with lots of RAM by today’s standards and then we’re see what we’ll see.


But the quest to make cvs-fast export faster, ever faster, continues. Besides the practical utility, I’m quite enjoying doing down-and-dirty systemsy stuff on something that isn’t GPSD. And, my regulars are having so much fun looking over my shoulder and offering suggestions that I’d kind of hate to end the party.


Here’s where things are, currently:


1. On the input side, I appear to have found a bug – or at least some behavior severely enough misdocumented that it’s tantamount to a bug – in the way flex-generated scanners handle EOF. The flex maintainer has acknowledged that something odd is going on and discussion has begun on flex-help.


A fix for this problem – which is presently limiting cvs-fast-export to character-by-character input when parsing master files, and preventing use of some flex speedups for non-interactive scanners – is probably the best near-term bet for significant performance gains.


If you feel like working on it, grab a copy of the code, delete the custom YY_INPUT macro from lex.l, rebuild, and watch what happens. Here’s a more detailed description of the problem.


Diagnosing this more precisely would actually be a good project for somebody who wants to help out but is wary of getting involved with the black magic in the CVS-analysis stuff. Whatever is wrong here is well separated from that.


2. I have given up for the moment on trying to eliminate the shared counter for making blob IDs. It turns out that the assumption those are small sequential numbers is important to the stage 3 logic – it’s used as an array index into a map of external marks.


I may revisit this after I’ve collected the optimizations with a higher expected payoff, like the flex fix and tuning.


3. A couple of my commenters are advocating a refinement of the current design that delegates writing the revision snapshots in stage 1 to a separate thread accessed by a bounded queue. This is clever, but it doesn’t attack the root of the problem, which is that currently the snapshots have to be written to disk, persist, and then be copied to standard output as blobs when the stream is being generated.


If I can refactor the main loop in generate() into a setup()/make-next-snapshot()/wrapup() sequence, something much more radical is possible. It might be that snapshot generation could be deferred until after the big black-magic branch merge in stage 2, which only needs each CVS revision’s metadata rather than its contents. Then make-next-snapshot() could be called iteratively on each master to generate snapshots just in time to be shipped on stdout as blobs.


This would be huge, possibly cutting runtime by 40%. There would be significant complications,though. A major one is that a naive implementation would have a huge working set, containing the contents of the last revision generated of all master files. There would have to be some sort of LRU scheme to hold that size down.


4. I have a to-do item to run more serious profiling with cachegrind and iostat, but I’ve been putting that off because (a) there are obvious wins to be had (like anything that reduces I/O traffic), (b) those numbers will be more interesting on the new monster machine I’ll be shopping for shortly.

 •  0 comments  •  flag
Share on Twitter
Published on October 14, 2014 18:38

October 12, 2014

A low-performance mystery, part deux

Well, the good news is, I get to feel wizardly this morning. Following sensible advice from a couple of my regulars, I rebuilt my dispatcher to use threads allocated at start time and looping until the list of masters is exhausted.


78 LOC. Fewer mutexes. And it worked correctly first time I ran it. W00t – looks like I’ve got the hang non-hang of this threads thing.


The bad news is, threaded performance is still atrocious in exactly the same way. Looks like thread-spawn overhead wasn’t a significant contributor.


In truth, I was expecting this result. I think my regulars were right to attribute this problem to cache- and locality-busting on every level from processor L1 down to the disks. I believe I’m starting to get a feel for this problem from watching the performance variations over many runs.


I’ll profile, but I’m sure I’m going to see cache misses go way up in the threaded version, and if I can find a way to meter the degree of disk thrashing I won’t be even a bit surprised to see that either.


The bottom line here seems to be that if I want better threaded performance out of this puppy I’m going to have to at least reduce its working set a lot. Trouble is, I’m highly doubtful – given what it has to do during delta assembly – that this is actually possible. The CVS snapshots and deltas it has to snarf into memory to do the job are intrinsically both large and of unpredictably variable size.


Maybe I’ll have an inspiration, but…Keith Packard, who originally wrote that code, is a damn fine systems hacker who is very aware of performance issues; if he couldn’t write it with a low footprint in the first place, I don’t judge my odds of second-guessing him successfully are very good.


Ah well. It’s been a learning experience. At least now I can say of multi-threaded application designs “Run! Flee! Save yourselves!” from a position of having demonstrated a bit of wizardry at them myself.

 •  0 comments  •  flag
Share on Twitter
Published on October 12, 2014 04:22

Eric S. Raymond's Blog

Eric S. Raymond
Eric S. Raymond isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.
Follow Eric S. Raymond's blog with rss.