More on this book
Community
Kindle Notes & Highlights
Read between
September 27 - November 20, 2024
This understanding has in turn led to a second revelation about human memory. If these tradeoffs really are unavoidable, and the brain appears to be optimally tuned to the world around it, then what we refer to as the inevitable “cognitive decline” that comes with age may in fact be something else.
Considered this way, it’s a wonder that the two of us—or anyone—can mentally keep up at all. What’s surprising is not memory’s slowdown, but the fact that the mind can possibly stay afloat and responsive as so much data accumulates.
Recent work by a team of psychologists and linguists led by Michael Ramscar at the University of Tübingen has suggested that what we call “cognitive decline”—lags and retrieval errors—may not be about the search process slowing or deteriorating, but (at least partly) an unavoidable consequence of the amount of information we have to navigate getting bigger and bigger.
Through a series of simulations, the researchers showed that simply knowing more makes things harder when it comes to recognizing words, names, and even letters. No matter how good your organization scheme is, having to search through more things will inevitably take longer. It’s not that we’re forgetting; it’s that we’re remembering. We’re becoming archives.
So as you age, and begin to experience these sporadic latencies, take heart: the length of a delay is partly an indicator of the extent of your experience. The effort of retrieval is a testament to how much you know. And the rarity of those lags is a testament to how well you’ve arranged it: keeping the most important things closest to hand.
You might already be using Earliest Due Date to tackle your workload, in which case you probably don’t need computer science to tell you that it’s a sensible strategy. What you may not have known, though, is that it’s the optimal strategy. More precisely, it is optimal assuming that you’re only interested in one metric in particular: reducing your maximum lateness.
Moore’s Algorithm minimizes the number of items you’ll need to throw away. Of course, you’re also welcome to compost the food, donate it to the local food bank, or give it to your neighbor. In an industrial or bureaucratic context where you can’t simply discard a project, but in which the number—rather than the severity—of late projects is still your biggest concern, Moore’s Algorithm is just as indifferent about how those late tasks are handled.
Again, there’s no way to change the total amount of time your work will take you, but Shortest Processing Time may ease your mind by shrinking the number of outstanding tasks as quickly as possible. Its sum-of-completion-times metric can be expressed another way: it’s like focusing above all on reducing the length of your to-do list.
The optimal strategy for that goal is a simple modification of Shortest Processing Time: divide the weight of each task by how long it will take to finish, and then work in order from the highest resulting importance-per-unit-time (call it “density” if you like, to continue the weight metaphor) to the lowest.
And while it might be hard to assign a degree of importance to each one of your daily tasks, this strategy nonetheless offers a nice rule of thumb: only prioritize a task that takes twice as long if it’s twice as important.
Putting off work on a major project by attending instead to various trivial matters can likewise be seen as “the hastening of subgoal completion”—which is another way of saying that procrastinators are acting (optimally!) to reduce as quickly as possible the number of outstanding tasks on their minds. It’s not that they have a bad strategy for getting things done; they have a great strategy for the wrong metric.
If it’s an email inbox blaring the figure of unread messages, then all messages are implicitly being given equal weight. Can we be blamed, then, for applying the unweighted Shortest Processing Time algorithm to the problem—dealing with all of the easiest emails first and deferring the hardest ones till last—to lower this numeral as quickly as possible?
What happens in a priority inversion is that a low-priority task takes possession of a system resource (access to a database, let’s say) to do some work, but is then interrupted partway through that work by a timer, which pauses it and invokes the system scheduler. The scheduler tees up a high-priority task, but it can’t run because the database is occupied. And so the scheduler moves down the priority list, running various unblocked medium-priority tasks instead—rather than the high-priority one (which is blocked), or the low-priority one that’s blocking it (which is stuck in line behind all
...more
Priority inheritance. If a low-priority task is found to be blocking a high-priority resource, well, then all of a sudden that low-priority task should momentarily become the highest-priority thing on the system, “inheriting” the priority of the thing it’s blocking.
The Shortest Processing Time algorithm, as we saw, is the optimal policy if you want to cross off as many items as quickly as possible from your to-do list. But if some of your tasks have precedence constraints, there isn’t a simple or obvious tweak to Shortest Processing Time to adjust for that.
continues to this day. A recent survey showed that the status of about 7% of all problems is still unknown, scheduling’s terra incognita. Of the 93% of the problems that we do understand, however, the news isn’t great: only 9% can be solved efficiently, and the other 84% have been proven intractable.*
In fact, the weighted version of Shortest Processing Time is a pretty good candidate for best general-purpose scheduling strategy in the face of uncertainty. It offers a simple prescription for time management: each time a new piece of work comes in, divide its importance by the amount of time it will take to complete. If that figure is higher than for the task you’re currently doing, switch to the new one; otherwise stick with the current task.
So considering the impact of uncertainty in scheduling reveals something counterintuitive: there are cases where clairvoyance is a burden. Even with complete foreknowledge, finding the perfect schedule might be practically impossible. In contrast, thinking on your feet and reacting as jobs come in won’t give you as perfect a schedule as if you’d seen into the future—but the best you can do is much easier to compute.
Personally, we have found that both programming and writing require keeping in mind the state of the entire system, and thus carry inordinately large context-switching costs.
Brian, for his part, thinks of writing as a kind of blacksmithing, where it takes a while just to heat up the metal before it’s malleable.
At the extreme, a program may run just long enough to swap its needed items into memory, before giving way to another program that runs just long enough to overwrite them in turn.
This is thrashing: a system running full-tilt and accomplishing nothing at all. Denning first diagnosed this phenomenon in a memory-management context, but computer scientists now use the term “thrashing” to refer to pretty much any situation where the system grinds to a halt because it’s entirely preoccupied with metawork.
In a thrashing state, you’re making essentially no progress, so even doing tasks in the wrong order is better than doing nothing at all. Instead of answering the most important emails first—which requires an assessment of the whole picture that may take longer than the work itself—maybe you should sidestep that quadratic-time quicksand by just answering the emails in random order, or in whatever order they happen to appear on-screen.
Thinking along the same lines, the Linux core team, several years ago, replaced their scheduler with one that was less “smart” about calculating process priorities but more than made up for it by taking less time to calculate them.
Establishing a minimum amount of time to spend on any one task helps to prevent a commitment to responsiveness from obliterating throughput entirely: if the minimum slice is longer than the time it takes to context-switch, then the system can never get into a state where context switching is the only thing it’s doing. It’s also a principle that is easy to translate into a recommendation for human lives. Methods such as “timeboxing” or “pomodoros,” where you literally set a kitchen timer and commit to doing a single task until it runs out, are one embodiment of this idea.
And again, this is a principle that can be transferred to human lives. The moral is that you should try to stay on a single task as long as possible without decreasing your responsiveness below the minimum acceptable limit. Decide how responsive you need to be—and then, if you want to get things done, be no more responsive than that.
If you find yourself doing a lot of context switching because you’re tackling a heterogeneous collection of short tasks, you can also employ another idea from computer science: “interrupt coalescing.” If you have five credit card bills, for instance, don’t pay them as they arrive; take care of them all in one go when the fifth bill comes.
And in the private sector, interrupt coalescing offers a redemptive view of one of the most maligned office rituals: the weekly meeting. Whatever their drawbacks, regularly scheduled meetings are one of our best defenses against the spontaneous interruption and the unplanned context switch.
Bayes’s critical insight was that trying to use the winning and losing tickets we see to figure out the overall ticket pool that they came from is essentially reasoning backward. And to do that, he argued, we need to first reason forward from hypotheticals. In other words, we need to first determine how probable it is that we would have drawn the tickets we did if various scenarios were true. This probability—known to modern statisticians as the “likelihood”—gives us the information we need to solve the problem.
This is the crux of Bayes’s argument. Reasoning forward from hypothetical pasts lays the foundation for us to then work backward to the most probable one.
In fact, for any possible drawing of w winning tickets in n attempts, the expectation is simply the number of wins plus one, divided by the number of attempts plus two: (w+1)⁄(n+2).
Laplace’s Law offers us the first simple rule of thumb for confronting small data in the real world. Even when we’ve made only a few observations—or only one—it offers us practical guidance. Want to calculate the chance your bus is late? The chance your softball team will win? Count the number of times it has happened in the past plus one, then divide by the number of opportunities plus two.
The mathematical formula that describes this relationship, tying together our previously held ideas and the evidence before our eyes, has come to be known—ironically, as the real heavy lifting was done by Laplace—as Bayes’s Rule. And it gives a remarkably straightforward solution to the problem of how to combine preexisting beliefs with observed evidence: multiply their probabilities together.
This sense of what was “in the bag” before the coin flip—the chances for each hypothesis to have been true before you saw any data—is known as the prior probabilities, or “prior” for short. And Bayes’s Rule always needs some prior from you, even if it’s only a guess. How many two-headed coins exist? How easy are they to get? How much of a trickster is your friend, anyway?
When you do have some estimate of prior probabilities, meanwhile, Bayes’s Rule applies to a wide range of prediction problems, be they of the big-data variety or the more common small-data sort. Computing the probability of winning a raffle or tossing heads is only the beginning. The methods developed by Bayes and Laplace can offer help any time you have uncertainty and a bit of data to work with.
More generally, unless we know better we can expect to have shown up precisely halfway into the duration of any given phenomenon.* And if we assume that we’re arriving precisely halfway into something’s duration, the best guess we can make for how long it will last into the future becomes obvious: exactly as long as it’s lasted already.
When Bayes’s Rule combines all these probabilities—the more-probable short time spans pushing down the average forecast, the less-probable yet still possible long ones pushing it up—the Copernican Principle emerges: if we want to predict how long something will last, and have no other knowledge about it whatsoever, the best guess we can make is that it will continue just as long as it’s gone on so far.
Purely mathematical estimates based on captured tanks’ serial numbers predicted that the Germans were producing 246 tanks every month, while estimates obtained by extensive (and highly risky) aerial reconnaissance suggested the figure was more like 1,400. After the war, German records revealed the true figure: 245.
Recognizing that the Copernican Principle is just Bayes’s Rule with an uninformative prior answers a lot of questions about its validity. The Copernican Principle seems reasonable exactly in those situations where we know nothing at all—such as looking at the Berlin Wall in 1969, when we’re not even sure what timescale is appropriate.
This kind of pattern typifies what are called “power-law distributions.” These are also known as “scale-free distributions” because they characterize quantities that can plausibly range over many scales: a town can have tens, hundreds, thousands, tens of thousands, hundreds of thousands, or millions of residents, so we can’t pin down a single value for how big a “normal” town should be.
In fact, money in general is a domain full of power laws. Power-law distributions characterize both people’s wealth and people’s incomes. The mean income in America, for instance, is $55,688—but because income is roughly power-law distributed, we know, again, that many more people will be below this mean than above it, while those who are above might be practically off the charts. So it is: two-thirds of the US population make less than the mean income, but the top 1% make almost ten times the mean.
Bayes’s Rule tells us that when it comes to making predictions based on limited evidence, few things are as important as having good priors—that is, a sense of the distribution from which we expect that evidence to have come. Good predictions thus begin with having good instincts about when we’re dealing with a normal distribution and when with a power-law distribution.
In fact, the uninformative prior, with its wildly varying possible scales—the wall that might last for months or for millennia—is a power-law distribution. And for any power-law distribution, Bayes’s Rule indicates that the appropriate prediction strategy is a Multiplicative Rule: multiply the quantity observed so far by some constant factor. For an uninformative prior, that constant factor happens to be 2, hence the Copernican prediction; in other power-law cases, the multiplier will depend on the exact distribution you’re working with.
When we apply Bayes’s Rule with a normal distribution as a prior, on the other hand, we obtain a very different kind of guidance. Instead of a multiplicative rule, we get an Average Rule: use the distribution’s “natural” average—its single, specific scale—as your guide. For instance, if somebody is younger than the average life span, then simply predict the average; as their age gets close to and then exceeds the average, predict that they’ll live a few years more.
Something normally distributed that’s gone on seemingly too long is bound to end shortly; but the longer something in a power-law distribution has gone on, the longer you can expect it to keep going.
The shape of this curve differs from both the normal and the power-law: it has a winglike contour, rising to a gentle hump, with a tail that falls off faster than a power-law but more slowly than a normal distribution. Erlang himself, working for the Copenhagen Telephone Company in the early twentieth century, used it to model how much time could be expected to pass between successive calls on a phone network. Since then, the Erlang distribution has also been used by urban planners and architects to model car and pedestrian traffic, and by networking engineers designing infrastructure for the
...more
The Erlang distribution gives us a third kind of prediction rule, the Additive Rule: always predict that things will go on just a constant amount longer.
In a power-law distribution, the longer something has gone on, the longer we expect it to continue going on.
In a normal distribution, events are surprising when they’re early—since we expected them to reach the average—but not when they’re late.
And in an Erlang distribution, events by definition are never any more or less surprising no matter when they occur. Any state of affairs is always equally likely to end regardless of how long it’s lasted.