Algorithms to Live By: The Computer Science of Human Decisions
Rate it:
Open Preview
34%
Flag icon
That is, it’s an attempt to formulate a theory that will account for the experiences you’ve had to date and say something about the future ones you’re guessing at.
34%
Flag icon
The lesson is this: it is indeed true that including more factors in a model will always, by definition, make it a better fit for the data we have already. But a better fit for the available data does not necessarily mean a better prediction.
34%
Flag icon
Granted, a model that’s too simple—for instance, the straight line of the one-factor formula—can fail to capture the essential pattern in the data.
34%
Flag icon
On the other hand, a model that’s too complicated, such as our nine-factor model here, becomes oversensitive to the particular data points that we happened to observe.
34%
Flag icon
Fundamentally, overfitting is a kind of idolatry of data, a consequence of focusing on what we’ve been able to measure rather than what matters. This gap between the data we have and the predictions we want is virtually everywhere. When making a big decision, we can only guess at what will please us later by thinking about the factors important to us right now.
34%
Flag icon
Even in our small daily acts this pattern holds: writing an email, we use our own read-through of the text to predict that of the recipient.
34%
Flag icon
The answer is that taste is our body’s proxy metric for health.
34%
Flag icon
But being able to modify the foods available to us broke that relationship.
35%
Flag icon
In other words, we can overfit taste. And the more skillfully we can manipulate food (and the more our lifestyles diverge from those of our ancestors), the more imperfect a metric taste becomes.
35%
Flag icon
Overfitting the signals—adopting an extreme diet to lower body fat and taking steroids to build muscle, perhaps—can make you the picture of good health, but only the picture.
35%
Flag icon
It’s as exciting a sport as ever, but as athletes overfit their tactics to the quirks of scorekeeping, it becomes less useful in instilling the skills of real-world swordsmanship.
35%
Flag icon
In the military and in law enforcement, for example, repetitive, rote training is considered a key means for instilling line-of-fire skills. The goal is to drill certain motions and tactics to the point that they become totally automatic. But when overfitting creeps in, it can prove disastrous.
35%
Flag icon
Mistakes like these are known in law enforcement and the military as “training scars,” and they reflect the fact that it’s possible to overfit one’s own preparation.
35%
Flag icon
Simply put, Cross-Validation means assessing not only how well a model fits the data it’s given, but how well it generalizes to data it hasn’t seen. Paradoxically, this may involve using less data.
35%
Flag icon
Aside from withholding some of the available data points, it is also useful to consider testing the model with data derived from some other form of evaluation entirely.
35%
Flag icon
In these cases, we’ll need to cross-validate the primary performance measure we’re using against other possible measures.
35%
Flag icon
Alongside such tests, however, schools could randomly assess some small fraction of the students—one per class, say, or one in a hundred—using a different evaluation method, perhaps something like an essay or an oral exam. (Since only a few students would be tested this way, having this secondary method scale well is not a big concern.) The standardized tests would provide immediate feedback—you could have students take a short computerized exam every week and chart the class’s progress almost in real time, for instance—while the secondary data points would serve to cross-validate: to make ...more
35%
Flag icon
Just as essays and oral exams can cross-validate standardized tests, so occasional unfamiliar “cross-training” assessments might be used to measure whether reaction time and shooting accuracy are generalizing to unfamiliar tasks.
35%
Flag icon
If you can’t explain it simply, you don’t understand it well enough. —ANONYMOUS
35%
Flag icon
One way to choose among several competing models is the Occam’s razor principle, which suggests that, all things being equal, the simplest possible hypothesis is probably the correct one.
36%
Flag icon
If we introduce a complexity penalty, then more complex models need to do not merely a better job but a significantly better job of explaining the data to justify their greater complexity. Computer scientists refer to this principle—using constraints that penalize models for their complexity—as Regularization.
36%
Flag icon
One algorithm, discovered in 1996 by biostatistician Robert Tibshirani, is called the Lasso and uses as its penalty the total weight of the different factors in the model.* By putting this downward pressure on the weights of the factors, the Lasso drives as many of them as possible completely to zero. Only the factors that have a big impact on the results remain in the equation—thus potentially transforming, say, an overfitted nine-factor model into a simpler, more robust formula with just a couple of the most critical factors.
36%
Flag icon
Living organisms get a certain push toward simplicity almost automatically, thanks to the constraints of time, memory, energy, and attention.
36%
Flag icon
On the other hand, we can also infer that a substantially more complex brain probably didn’t provide sufficient dividends, evolutionarily speaking. We’re as brainy as we have needed to be, but not extravagantly more so.
36%
Flag icon
Actual, biological neural networks sidestep some of this problem because they need to trade off their performance against the costs of maintaining it. Neuroscientists have suggested, for instance, that brains try to minimize the number of neurons that are firing at any given moment—implementing the same kind of downward pressure on complexity as the Lasso. Language forms yet another natural Lasso: complexity is punished by the labor of speaking at greater length and the taxing of our listener’s attention span.
36%
Flag icon
And anything that needs to be remembered has to pass through the inherent Lasso of memory.
36%
Flag icon
I should have computed the historical covariances of the asset classes and drawn an efficient frontier. Instead, I visualized my grief if the stock market went way up and I wasn’t in it—or if it went way down and I was completely in it. My intention was to minimize my future regret. So I split my contributions fifty-fifty between bonds and equities.
36%
Flag icon
When it comes to portfolio management, it turns out that unless you’re highly confident in the information you have about the markets, you may actually be better off ignoring that information altogether. Applying Markowitz’s optimal portfolio allocation scheme requires having good estimates of the statistical properties of different investments.
36%
Flag icon
But when the odds of estimating them all correctly are low, and the weight that the model puts on those untrustworthy quantities is high, then an alarm should be going off in the decision-making process: it’s time to regularize.
36%
Flag icon
Imposing penalties on the ultimate complexity of a model is not the only way to alleviate overfitting, however. You can also nudge a model toward simplicity by controlling the speed with which you allow it to adapt to incoming data.
36%
Flag icon
As the New York Times put it, “coconut water seems to have jumped from invisible to unavoidable without a pause in the realm of the vaguely familiar.”
36%
Flag icon
The biggest purchaser of kale the year before had been Pizza Hut, which put it in their salad bars—as decoration.
36%
Flag icon
In contrast, if we look at the way organisms—including humans—evolve, we notice something intriguing: change happens slowly.
36%
Flag icon
Though crossed-over nerve fibers and repurposed jawbones may seem like suboptimal arrangements, we don’t necessarily want evolution to fully optimize an organism to every shift in its environmental niche—or, at least, we should recognize that doing so would make it extremely sensitive to further environmental changes.
37%
Flag icon
A bit of conservatism, a certain bias in favor of history, can buffer us against the boom-and-bust cycle of fads.
37%
Flag icon
In machine learning, the advantages of moving slowly emerge most concretely in a regularization technique known as Early Stopping.
37%
Flag icon
Their models can therefore be kept from becoming overly complex simply by stopping the process short, before overfitting has had a chance to creep in.
37%
Flag icon
Those extra hours, it turned out, had been spent nailing down nitty-gritty details that only confused the students, and wound up getting cut from the lectures the next time Tom taught the class. The underlying issue, Tom eventually realized, was that he’d been using his own taste and judgment as a kind of proxy metric for his students’. This proxy metric worked reasonably well as an approximation, but it wasn’t worth overfitting—which explained why spending extra hours painstakingly “perfecting” all the slides had been counterproductive.
37%
Flag icon
If the factors we come up with first are likely to be the most important ones, then beyond a certain point thinking more about a problem is not only going to be a waste of time and effort—it will lead us to worse solutions. Early Stopping provides the foundation for a reasoned argument against reasoning, the thinking person’s case against thought.
37%
Flag icon
The greater the uncertainty, the bigger the gap between what you can measure and what matters, the more you should watch out for overfitting—that is, the more you should prefer simplicity, and the earlier you should stop.
37%
Flag icon
To return to Darwin, his problem of deciding whether to propose could probably have been resolved based on just the first few pros and cons he identified, with the subsequent ones adding to the time and anxiety expended on the decision without necessarily aiding its resolution (and in all likelihood impeding it). What seemed to make up his mind was the thought that “it is intolerable to think of spending one’s whole life like a neuter bee, working, working, & nothing after all.” Children and companionship—the very first points he mentioned—were precisely those that ultimately swayed him in ...more
37%
Flag icon
Darwin was no Franklin, adding assorted considerations for days. Despite the seriousness with which he approached this life-changing choice, Darwin made up his mind exactly when his notes reached the bottom of the diary sheet. He was regularizing to the page. This is reminiscent of both Early Stopping and the Lasso: anything that doesn’t make the page doesn’t make the decision.
38%
Flag icon
In fact, no one understands as well as a computer scientist that in the face of a seemingly unmanageable challenge, you should neither toil forever nor give up, but—as we’ll see—try a third thing entirely.
38%
Flag icon
If it had been studied in the nineteenth century it might have become forever known as “the prairie lawyer problem,” and if it had first come up in the twenty-first century it might have been nicknamed “the delivery drone problem.” But like the secretary problem, it emerged in the mid-twentieth century, a period unmistakably evoked by its canonical name: “the traveling salesman problem.”
38%
Flag icon
Hassler Whitney posed the problem in a 1934 talk at Princeton, where it lodged firmly in the brain of fellow mathematician Merrill Flood (who, you might recall from chapter 1, is also credited with circulating the first solution to the secretary problem).
38%
Flag icon
They asserted what’s now known as the Cobham–Edmonds thesis: an algorithm should be considered “efficient” if it runs in what’s called “polynomial time”—that is, O(n2), O(n3), or in fact n to the power of any number at all. A problem, in turn, is considered “tractable” if we know how to solve it using an efficient algorithm.
38%
Flag icon
One of the simplest forms of relaxation in computer science is known as Constraint Relaxation. In this technique, researchers remove some of the problem’s constraints and set about solving the problem they wish they had. Then, after they’ve made a certain amount of headway, they try to add the constraints back in.
38%
Flag icon
For instance, you can relax the traveling salesman problem by letting the salesman visit the same town more than once, and letting him retrace his steps for free. Finding the shortest route under these looser rules produces what’s called the “minimum spanning tree.” (If you prefer, you can also think of the minimum spanning tree as the fewest miles of road needed to connect every town to at least one other town.
38%
Flag icon
For one thing, the spanning tree, with its free backtracking, will never be any longer than the real solution, which has to follow all the rules. Therefore, we can use the relaxed problem—the fantasy—as a lower bound on the reality.
39%
Flag icon
The message is simple but profound: if we’re willing to accept solutions that are close enough, then even some of the hairiest problems around can be tamed with the right techniques.
1 7 11