ifdex: a tool for code archeologists
I’ve written a tool to assist intrepid code archeologists trying to comprehend the structure of ancient codebases. It’s called ifdex, and it comes with a backstory. Grab your fedora and your bullwhip, we’re going in…
One of the earliest decisions we made on NTPsec was to replace its build system. It had become so difficult to understand and modify that we knew it would be significant drag on development.
Ancient autoconf builds tend to be crawling horrors and NTP’s is an extreme case – 31KLOC of kludgy macrology that defines enough configuration symbols to make getting a grasp on its interface with the codebase nigh-impossible even when you have a config.h to look at. And that’s a problem when you’re planning large changes!
One of our guys, Amar Takhar, is an expert on the waf build system. When he tentatively suggested moving to that I cheered the idea resoundingly. Months later he was able to land a waf recipe which, while not complete, would at least produce binaries that could be live-tested.
When I say “not complete” I mean that I could tell that there were configuration #defines in the codebase that the waf build never set. Quite a few of them – in some cases fossils that the autoconf build didn’t touch either, but in others … not. And these unreached configuration knobs tended to get lost amidst a bunch of conditional guards looking at #defines set by system headers and the compiler.
And we’re not talking a handful or even dozens. I eventually counted over 670 distinct #defines being used in #if/#ifdef/#ifndef/#elif guards – 2430 of them, as A&D regular John D. Bell pointed out in a comment on my last post. I needed some way to examine these and sort them into groups – this is from a system header, that’s a configuration knob, and over there is something else…
So I wrote an analyzer. It parses every compile-time conditional in a code tree for symbols, then reports them either as a bare list or GCC-like file/line error messages that you can step through with Emacs compilation mode.
To reduce noise, it knows about a long list of guard symbols (almost 200 of them) that it should normally ignore – things like the __GNUC__ symbol that GCC predefines, or the O_NONBLOCK macro used by various system calls.
The symbols are divided into groups that you can choose to ignore individually with a command-line option. So, if you want to ignore all standardized POSIX macros in the list but see anything OS-dependent, you can do that.
Another important feature is that you can build your own exclusion lists, with comments. The way I’m exploring the jungle of NTP conditionals is by building a bigger and bigger exclusion list describing the conditional symbols I understand. Eventually (I hope) the report of unknown symbols will shrink to empty. At that point I’ll know what all the configuration knobs are with certainty.
As of now I have knocked out about 300 of them and have 373 to go. That’s most of a week’s work, even with my spiffy new tool. Oh well, nobody ever said code archeology was easy.
Eric S. Raymond's Blog
- Eric S. Raymond's profile
- 140 followers
