Eric S. Raymond's Blog, page 32
November 30, 2014
How To Learn Hacking: Version 1.2
How To Learn Hacking: Version 1.2, with a new section on being original. Incorporates more feedback from here and G++
For those of you who wondered why this didn’t just become a major section in How To Become A Hacker, it’s because I think it might become long enough to make that document too bulky to read at one sitting.
November 29, 2014
Why labor unions have lost their moxie
A lot of U.S. economic policy is distorted by the belief that manufacturing jobs are a magic bullet against declining incomes. Manufacturing’s false promise of a decent payday punctures that illusion.
One of the dumb, predictable responses to articles like this is “We need a stronger union movement”. Sorry, but no. Declining manufacturing wages aren’t an effect of the weakening of unions and can’t be reversed by strengthening them. Explanation follows.
Unions only help if the underlying economic situation is that the employer is able to charge a great deal more for the amount of product generated per worker-hour than the worker is getting – there is headroom for the worker’s wage to expand into while the manufacturer still makes a net profit. (If the manufacturer doesn’t make a net profit the business collapses and nobody gets paid.)
During the age that manufacturing nostalgisists remember nostalgically, this was true. For most of that period (roughly 1870-1970), the capital goods required to manufacture in a way price-competitive with the U.S. were so expensive that almost nobody outside the U.S. could afford them, and in the few places that could they were mainly preoccupied with supplying their domestic markets rather than the U.S. World War II prolonged this period by hammering those “few places” rather badly.
In that environment, U.S. firms could profit-take hugely, benefited by being scarce suppliers not just to the U.S. but (later on) to the whole world. And unions could pry loose enough of that margin to make manufacturing jobs comfortably middle-class.
All that ended in the early 1970s. A good marker for the change is the ability of the Japanese to make cheap cars for export and sell them for the U.S.
In the new world, the profit margins on manufactured goods narrowed dramatically. The manufacturing firms could no longer effectively ignore overseas competition in the U.S. domestic market. U.S. consumers no longer had to to pay the large price premiums required to sustain domestic manufacturing wages at pre-1970 levels, and they jumped right on that option.
In this environment, unions don’t help because they have almost no negotiating room. If they bid up workers’ wages, the jobs will evaporate or move overseas – not because corporations are being “greedy” but because they can no longer charge the prices that would allow such high wages to be sustained. Too much foreign labor and capital is ready to pounce on the first hint of price-taking.
The only place unions still have anything like that kind of headroom is in service industries where the jobs can’t easily be moved. This is why service-employee unions are the new powerhouses of the labor movement.
November 25, 2014
Like a football player, head down
OK, this is interesting: From some tabloid, we have the following quote:
The unidentified witness wrote that the 18-year-old Brown “has his arms out with attitude,” while “The cop just stood there.” The witness added, “Dang if that kid didn’t start running right at the cop like a football player. Head down.”
This is exactly how I reconstructed the event in This picture tells a shooting story. I said: the reason I’m sure Brown was moving is the extreme torso angle suggested by the lack of exit wounds on the back. A human trying to do that standing still would overbalance and fall, which is why I think he was running or lunging when he took the bullets.
The witness said “arms out with attitude”. I said “with his right arm stretched forward [...] probably while Brown was grabbing for Wilson or the pistol with his right hand.”
So much for “Hands up – don’t shoot.” It’s as I thought: Brown autodarwinated, bull-rushing an armed policeman he had already injured once.
UPDATE: I failed to make clear before that this account was part of the evidence dump from the grand jury proceedings, not just some random the tabloid turned up.
November 22, 2014
How To Learn Hacking: Version 1.1
I had meant to blog-announce version 1.0 of How To Learn Hacking but got distracted at a crucial moment. So it went out on G+, where I got some useful feedback.
This version is revised and very slightly expanded. Enjoy and critique.
November 16, 2014
SRC 0.9: Ready for the less adventurous now
I just shipped SRC 0.9, and you no longer need to be adventurous to try it. It has a regression-test suite and real users.
Remarkably, SRC has had real users since 0.3, two days after it was born. Even more remarkably, the count of crash reports and botched operations from those users is zero. Zero. This is what you can gain from keeping code simple – I have has a couple of bug reports but they were both about filename quoting in the fast-export code, which is not a central feature.
Next, I’ll make a couple of what I think are important points about writing for zero defects. Then I’ll talk about a subtle issue or two in the design, and our one known behavioral glitch.
If you read the source code for SRC, you are rather likely to think “Well, no wonder it has not had any reported defects. This is trivial! There’s nothing here!” If that is your reaction, you are skirting a large truth. The precisely important thing about SRC is that it adds a lot of value to RCS at the same time that it is so simple that you can easily see all the way down to the bottom of the code. Achieving that took harder work and deeper thought than would have been required for a program that was fussier, more complex – and more bug-prone.
Too often programmers succumb to the temptation to be clever rather than simple. Edsger Dijkstra used to bang on about this, and he was right. In SRC I strove for simplicity, not just in implementation (which is why I re-used RCS rather than writing my own storage manager) but in design as well, all the way out to the level of the user interface.
The result is, alas, not perfect. There is at least one wart, and arguably a couple more, in the UI.
Early on I made the basic decision that the commands would have the general form “src verb [modifiers] [range] [filename...]“; this, of course, is a deliberate imitation of CVS/SVN/hg/git designed to make the UI feel instantly comfortable to their users. Unfortunately, this clashed with two other premises: range specifications being optional and named tags being available in range specifications. Consider this:
tag foo bar baz
OK, so we can tell by its position that ‘foo’ is a tag name. But is ‘bar’ a tag naming a revision (the command reads “create a tag named ‘foo’ at the location named by ‘bar’ in file ‘baz’) or is it a filename (‘create a tag named ‘foo’ at the default branch tips in files ‘bar’ and ‘baz’)?
Syntactically there’s no way to tell, so I had to introduce a special prefix ‘@’ meaning “this is a revision specifier even though it looks like a filename”. A wart.
Suppose you have, for some reason, a file named “23”. To say “this is a filename even though it looks like it’s a revision specfier” you could write this:
tag foo -- 23 baz
This seems less warty, if only because there’s precedent in Git and elsewhere for using “–” to mean “options end here, normal operands start afterwards”.
Most of the serious ambiguities were in the ‘tag’ and ‘branch’ commands. Reluctantly, I ended up requiring a qualifier on these commands – in 0.9 you have to say “tag create”, “tag delete” and “tag list”. Otherwise the rules for when a token would be interpreted as a qualifier, when as a named tag, and when as filename just got too easy to screw up. The alternative would have been to drop back to a much more verbose RCS-like UI with switches, switches everywhere.
I still have one serious implementation issue. SRC uses utime(2) to touch both a workfile and then its master on checkout, so we can test for modified status just by comparing modification dates afterwards.
The problem is that utime(2) seems to be prone to flaky failures. Mike Swanson and I have been poking at this and we can’t figure out where the problem is. Python? The kernel? My on-line research has turned up a lot of bug reports about utime(2) that seem vaguely relevant; they hint that the problem might even be in an (unknown) glibc bug that’s tripping up Python.
To be further investigated. I’m seriously thinking about porting the code to another language (Go is the leading candidate – better fit to SRC’s abstractions than Rust) to find out of the bug replicates.
November 12, 2014
Emacs git conversion is done
Finally. After ten months of work, it’s done. Emacs is fully converted to git. You can clone from git://git.sv.gnu.org/emacs.git and if you have commit rights you can push to it and the changes will stick. The bzr repo is still up but only as an archive.
Technically, this was reposurgeon’s finest hour. I’ve never done a conversion this big and messy before, as I noted in Ugliest…repository…conversion…ever. I had to write major new features to handle the job. I guess the most obvious of these is the macro facility.
I don’t expect to have to do one this difficult again. I fervently hope not to have to do one this difficult again.
As I wrote in Dragging Emacs Forward, my hope is this will let some light and fresh air into Emacs development. New talent, new ideas, revitalizing energy.
Happy hacking, everyone!
November 11, 2014
SRC FAQ
A&D regular Mike Swanson did such a nice job on this that I want you all to see it.
SRC FAQ
version 1.0
Why SRC instead of $VCS?
Most version control systems today are multi-user, multi-file, and multi-fork oriented. These are all good features and properties to have, but they neglect the need to maintain simple single-file documents, such as HOWTOs and FAQs, much like the very file you are reading now. There is even a good use-case for small programs and scripts. Do you presently keep your ~/bin contents under version control? If not, consider using SRC for them.
XYZ already does single-file version control, why another one?
It is true, other VCSes already fulfill this simple criterion, SCCS and RCS being some of the earliest examples dating back to the 1970s and 1980s. While SCCS died off due to its proprietary nature, RCS has kept a niche for itself precisely for single-file projects. In fact, SRC is built on top of RCS, rather than reimplementing all of the gritty master file details.
The idea that spawned the development of SRC was that it would
have five properties characterizing it:
Only deals with single files. Use for HOWTOs, memoranda, etc.
Allows multiple histories to live in the same directory without entanglement.
Has a human-readable master-file representation – no binary blobs.
Modern CLI user-interface. Commands familiar to Subversion, Hg, Git users.
Integer sequential revision numbers a la Subversion.
Notably, RCS itself fails on the latter two criteria. Designed both as an early attempt at VCS and for multi-user environments, the commands are awkward to deal with and it even requires complicated processes of locking and unlocking files in order to edit and commit them. None of this is appropriate anymore. Modern DVCSes with a non-locking model have proven more effective for multi-user projects and wide coordination.
Other projects to mold Mercurial and Git for a single-file purpose at the very least will fail criteria #3 and #5, and often #4 as well.
Does SRC mean that $DVCS is obsolete?
Absolutely not! SRC and DVCSes serve entirely opposite needs. SRC’s strength is precisely when there is no need nor desire for collaboration or publishing features, when there is only a single file and a single author for a file. In fact, if your script grows into a full project in its own right, SRC has a src fast-export command that can be used to jump-start a DVCS repository with the entire existing history intact.
SRC might make certain uses of DVCS obsolete. Such as keeping individual documents tucked away in their own directories so that the DVCS can operate (which usually have a special repository directed named like .hg or .git). Scripts to impose a single-file concept on top of these systems do not go far enough with respect to the reasons SRC exists.
Is SRC a good system to learn about version control?
YES! SRC is explicitly designed to have the bare-bones features and commands of a modern version control system. Keep in mind that SRC’s strength is single-file documents and projects. If you have loose scripts and documents not presently under any version control, SRC is a good candidate for playing around with them.
If instead you have a large multi-file project, ease yourself into using a DVCS with simple commands, possibly even using SRC’s command set as a guideline for which ones to learn first. You will appreciate having actual changesets that span multiple files in this use case. Mercurial and Git are the most common, which means they are also easy to find help for.
Does SRC have keyword expansion?
No. When SRC commits a file with RCS on the backend, it uses -kb which explicitly disables all kind of expansion, and also allows arbitrary binary files to be stored. Keyword expansion has, in general, not been well-accepted in the VCS world and most modern VCSes do not support it at all. Do not even suggest this feature, it will not be implemented.
Does SRC have $FEATURE?
If you don’t see it in the “src help” listing, probably not. You are certainly free to suggest features, but SRC is developed with extreme conservatism as to what features to implement or not. Remember, single-file, single-user, private VCS.
Before requesting a feature, ask yourself whether it makes SRC more complicated, whether it really helps a single author or developer, and whether it really makes sense to deploy SRC for your use-case instead of a DVCS. These can all be hard questions, and if you are in doubt, you may go forth with your request, others may share their own opinions.
SRC shines in its simplicity. Extra features is not against this, but too many can easily creep over into “too complicated” territory.
How well does SRC handle files over the network?
The answer is either “completely fine” or “not at all”, depending on what is being asked. :-)
SRC makes no special provisions, it operates in the current working directory whether that is local storage, NFS, CIFS, sshfs, or any other kind of networking file system. As long as the directory tree is mounted on your system, SRC should be able to handle it.
Why doesn’t src status display show the same letters as $VCS?
Consistency with other version control systems is an important way to reduce any kind of surprises while using SRC. Unfortunately, the single-letter codes used for statuses are not identical between VCSes and often conflict with each other over specific meanings. For example, D means deleted in Subversion and Git, but Mercurial uses R for that same meaning. Git uses R to mean renamed, while Subversion uses it to mean replaced.
It is an unfortunate state of affairs. The development philosophy behind SRC is to keep it as un-innovative and unsurprising as possible, but since multiple VCSes in widespread use have not converged on the same meanings for single-letter status codes, SRC needs to settle on its own definitions that may differ from what you are used to.
Um. This is a joke, right?
No, though the author admits he did laugh a lot while roughing out the original design. Resurrect RCS? Wrap it in a decent UI? Really?
There’s a significant amount of ha-ha-only-serious here. Laugh, but treat SRC as a lesson in several things. Unix minimalism. The virtue of re-use, even of technology as old as RCS. The value of a carefully designed UI. The value of a conservative design with no surprises and no undue cleverness.
What will be funny is when we implement the back end to talk to SCCS. It won’t be difficult, that whole end of the interface is encapsulated in a class…
Acknowledgments
Nearly all of this FAQ, except “This is a joke, right?”, was written by Mike Swanson, aka chungy.
November 9, 2014
SRC 0.3 – ready for the adventurous
My low-power, low-overhead version control system, SRC, is no longer just a stake in the ground. It is still a determinedly file-oriented wrapper around RCS (and will stay that way) but every major feature except branching is implemented and it has probably crossed the border into being useful for production.
The adventurous can and should try it. You’re safe if it blows up because the histories are plain RCS files. But, as previously noted, it’s RCS behind an interface that’s actually pleasant to use. (You Emacs VC-mode users pipe down; I’m going to explain why you care in a bit.)
The main developments today include a fairly complete regression-test suite (already paying large dividends in speeding up progress) and a “src status” command that will look very familiar to Subversion/git/hg users. There’s a hack behind that status command I’m rather proud of; I’ll talk about that, too.
Presented for your perusal, some command synopses. In all of the following, A ‘revision’ is a 1-origin integer, or a tag name designating an integer revision. A revision range is a single revision or a pair of revisions separated by”-” or “..”. Unless otherwise noted under individual commands, the default revision is the tip revision on the current branch and the default range is all revisions on the current branch.
The token “–” tells the command-line interpreter that revision-specs and subcommands are done – everything after it is a filename, even if it looks like a subcommand or revision number.
src help [command]
Displays help for commands.
src add ['file'...]
Initialize new project histories for specified files. Creates
the repository directory if required.
src commit [- | -m 'commentstring' | -f 'commentfile'] ['file'...]
Enters a commit for specified files. Separately to each one.
With '-', take comment text from stdin; with '-m' use the
following string as the comment; with '-f' take from a file.
ci is a synonym for commit.
src checkout ['revision'] ['file'...]
Refresh the working copy of the file(s) from their history files.
co is a synonym for checkout.
src status ['file'...]
A = added, U = unmodified, M = modified, ! = missing, ? = not tracked,
I = ignored.
src cat [revision-range] ['file'...]
Send the specified revisions of the files to standard output.
src tag [list|-l|delete|del|-d|rename|-r] ['name'] ['revision'] ['file'...]
Create, rename, or delete a tag. With no or only file arguments, list tags.
src branch [list|-l|delete|del|-d|rename|-r] ['name'] ['revision'] ['file'...]
Create, rename, switch to, or delete a branch. With no arguments,
list branches; the active branch is first in the list. The default
branch is 'trunk'.
src list ['revision-range'] ['file'...]
Sends summary information about the specified commits to standard output.
In each file listing, the summary line tagged with '*' is the
state that checkout would return to.
src log ['revision-range'] ['file'...]
Sends log information about the specified commits to standard output.
src diff ['revision-range'] ['file'...]
Sends a diff listing to standard output. With no revision spec, diffs
the working copy against the last version checked in. With one revno,
diffs the working copy against that stored revision; with a range,
diff between the beginning and end of the range.
src ls
List all registered files.
src move 'old' 'new'
Rename a file and its master. Refuses to step on existing files or masters.
'mv' and 'rename' are synonyms.
src copy 'old' 'new'
Rename a file and its master. Refuses to step on existing files or masters.
'cp' is a synonym.
src fast-export ['revision-range'] ['file'...]
Export one or more projects to standard output as a git fast-import stream.
The committer identification is copied from your Git configuration.
src fast-import [-p]
Parse a git-fast-import stream from standard input. The modifications for
each individual file become a SRC history. Mark, committer and
author data, and mark cross-references to parent commits, are preserved
in RFC-822-style headers on log comments unless the -p (plain) option
is given, in which case this metadata is discarded.
The omission of ‘src remove’ is a deliberate speed bump.
The thing is, this is it. You now know everything there is to know about SRC except some implementation details. It is intentionally an exercise in simplicity and least surprise – if anything about the above struck you as surprising or novel it was probably a design error on my part.
Yes, it really is still RCS underneath. See what can be done with a bit of care and attention to UI design? Er, not to mention a shameless willingness to crib from good examples. UI design should be egoless; if you succumb to the temptation to show off, you’re probably doing it wrong.
This is all implemented and regression-tested, except for “src branch” which does all the right parsing and sanity checks but doesn’t have back-end methods yet. I also wouldn’t lean on src fast-import too heavily, as the external tool it calls, rcs-fast-import, hasn’t been tested a lot since I wrote it.
Otherwise it’s good to go. Now I’ll explain the most subtle change in the interface from RCS and why it means VC-mode users should care. In a word: locklessness.
RCS was designed for an environment of multi-user contention. Working copies of files are read-only until they’re explictly checked out (locked, in RCS-speak) by a user. When locked, the workfile become writable (confusing, I know). When your changes are checked back in, the lock is released and the workfile goes read-only again.
This is completely inappropriate in today’s era of single-user computers. The fact that RCS workfiles are normally locked is a continuing source of friction – you go to edit, get a failure message, remember you have to do an explicit checkout, and *boom* you just lost whatever train of thought you were on.
Emacs VC mode didn’t fix this – though it did reduce the checkout friction to one key combination – because at the time I wrote it (1992, I think) locking VCSes had not given way to to merging ones. The most important thing SRC does to RCS is do away with that locking. This means that even through its VC mode (not yet written, on my list) SRC will be more pleasant to use than RCS.
And let’s not forget the nice Subversion-style plain-integer revision numbers, either. RCS revision IDs are ugly, cluttery things that ought to be hidden in any decent interface.
Another feature that will improve user experience greatly is the “src status” command. The VC mode for RCS is a complicated mess in large part because RCS has nothing like it natively, so Emacs has to simulate it by directly parsing master files. And there are two cute tricks I invented for SRC that VC mode doesn’t know.
First trick: suppose you’re trying to tell the difference between U (unmodified) and M (modified) status. I am actually no longer sure what VC does for this – lots of people have hacked on (and overcomplicated) that code since I first wrote it – but looking at the thicket of Lisp I can see it’s a kluge involving a lot of parsing of master files. That sort of thing is error-prone. I was much younger when I wrote it, and perhaps lacking in wisdom.
Here’s the simple way. When you check out the file (which SRC does immediately after each checkin so as to run lockless) you then call utimes(2) on the master to uptate its modtime. Now you can tell M from U just by checking to see if the workfile was modified after the master. This is really fast because you never have to look at the file contents, just the inode.
Another cute trick: the fast way to tell A (just added) status from U, which VC doesn’t use, is to look at the size. There’s a threshold size with all empty that any master with actual commits in it can’t get below. Again, this means we get away with just looking at the inode, not the actual file content.
In fact, I was able to write the entirety of “src status” so it never opens the workfile or master. This means good performance and responsiveness even on slow network file systems. In fact, a status check should be faster under SRC than under plain RCS!
Now I need to get branching to work. And write that VC back end. Naturally, direct support in reposurgeon is already up.
November 6, 2014
I wrote a version-control system today
I wrote a version-control system today. Yes, an entire VCS. Took me 14 hours.
Yeah, you’re looking at me like I’m crazy. “Why,” you ask, quite reasonably, “would you want to do a thing like that? We’re not short of powerful VCSes these days.
That is true. But I got to thinking, early this morning, about the fact that I haven’t been able to settle on just one VCS. I use git for most things, but there’s a use case git doesn’t cover. I have some document directories in which I have piles of things like HOWTOs which have separate histories from each other. Changes in them are not correlated, and I want to be able to move them around because I sometimes do that to reorganize them.
What have I been using for this? Why, RCS. The ancient Revision Control System, second oldest VCS in existence and clinging tenaciously to this particular niche. It does single-file change histories pretty well, but its UI is horrible. Worse than git’s, which is a pretty damning comparison.
Then I got to thinking. If I were going to design a VCS to do this particular single-file, single-user job, what would it look like? Hm. Sequential integer revision numbers, like Subversion and Mercurial used locally. Lockless operation. Modern CLI design. Built-in command help. Interchange with other VCSes via git import streams. This sounds like it could be nice…
Then, the idea that made it inevitable. “I bet.” I thought, “I could write this thing as a Python wrapper around RCS tools. Use them for delta storage but hide all the ugly parts.”
Thus, SRC. Simple Revision Control, v0.1.
This first version is a very rough cut. It does all the basic VCS things – commits, checkouts, diff listings, tags – but the implementation is fragile. The first other person to look at it has reported that it inexplicably fails when you set EDITOR=vi. (UPDATE: This is already fixed.)
Still…read the manual page to see where it’s going (I wrote the manual page before the code). Most of the UI is shamelessly swiped from Subversion – I simplified where it made sense.
Yes, I will implement branching and import/export. There will be Emacs VC support, too. The overall emphasis will be on keeping it simple and light, a handy small tool for the jobs where a real VCS would be overkill. And if it goes sproing – hey, the masters are RCS files, you have an easy recovery path.
SRC – RCS as if user interface mattered. SRC – maybe Rome wasn’t built in a day, but this tool was. SRC – when you care enough to use the very least. Thank you, I’ll be here all week.
And if you’re thinking “Hey, that’s cheating! You didn’t really write a VCS, you let RCS do the hard parts!”, why, doing that is downright traditional. CVS was implemented – badly – the same way. But we’ve learned a lot in the quarter-century since, and know what mistakes not to repeat.
November 5, 2014
Chipping away at CVS
I’ve just shipped a new version of cvs-fast-export, 1.26. It speeds the tool up more, more, more – cranking through 25 years and 113300 commits of Emacs CVS history, for example in 2:48. That’s 672 commits a second, for those of you in the cheap seats.
But the real news this time is a Python wrapper called ‘cvsconvert’ that takes a CVS repository, runs a conversion to Git using cvs-fast-export, and then – using CVS for checkouts – examines the CVS and git repositories side by side looking for translation glitches. It checks every branch tip and every tag.
Running this on several of my test repos I’ve discovered some interesting things. One such discovery is of a bug in CVS. (Yeah, I know, what a shock…)
CVS uses the RCS state field value of “dead” to mark files that have been deleted. I found a case in the CVS repo of a project called “timidity” where a file had somehow ended up with state dead at rev 1.2, Exp (the default live state) at 1.3, and dead again at its final revision of 1.4. This confused CVS badly; a checkout keyed to a tag made after 1.3 but before 1.4 should have included the file but did not.
This showed up as a defect (mismatched file manifests) in cvsconvert. I spent half a day looking for where cvs-fast-export had gone wrong before I figured out that cvs-fast-export was doing what the metadata in the master said it should – it was CVS that had screwed the pooch. Annoying, but not very surprising.
This was an example of the most common kind of defect – files that had been deleted in CVS showing up under tags in gitspace when they didn’t under the corresponding CVS tags. Maybe eventually I’ll figure out how to perfectly match CVS’s behavior here, but it’s not really a big deal – there tend to be only a few of these per CVS repository and a few minutes’ work with reposurgeon will snip them off nicely.
Reassuringly, I found no cases anywhere of manifest mismatches or file differences at master or any other branch tip. Well, other than some trivial file differences due to CVS keyword expansion, and those can be suppressed.
The design approach of cvsconvert seems quite successful. I may try writing something parallel to it to sanity-check Subversion lifts.
Eric S. Raymond's Blog
- Eric S. Raymond's profile
- 140 followers
