The Signal and the Noise: Why So Many Predictions Fail-but Some Don't
Rate it:
Open Preview
10%
Flag icon
Academic experts like the ones that Tetlock studied can suffer from the same problem. In fact, a little knowledge may be a dangerous thing in the hands of a hedgehog with a Ph.D. One of Tetlock’s more remarkable findings is that, while foxes tend to get better at forecasting with experience, the opposite is true of hedgehogs: their performance tends to worsen as they pick up additional credentials. Tetlock believes the more facts hedgehogs have at their command, the more opportunities they have to permute and manipulate them in ways that confirm their biases. The situation is analogous to what ...more
10%
Flag icon
Hedgehogs, by contrast, have more trouble distinguishing their rooting interest from their analysis. Instead, in Tetlock’s words, they create “a blurry fusion between facts and values all lumped together.” They take a prejudicial view toward the evidence, seeing what they want to see and not what is really there.
10%
Flag icon
Hedgehogs who have lots of information construct stories—stories that are neater and tidier than the real world, with protagonists and villains, winners and losers, climaxes and dénouements—and, usually, a happy ending for the home team.
10%
Flag icon
You can get lost in the narrative. Politics may be especially susceptible to poor predictions precisely because of its human elements: a good election engages our dramatic sensibilities. This does not mean that you must feel totally dispassionate about a political event in order to make a good prediction about it.
10%
Flag icon
Despite the election being many months away, commentary focused on the inevitability of Clinton’s nomination, ignoring the uncertainty intrinsic to such early polls. There seemed to be too much focus on Clinton’s gender and Obama’s race.24 There was an obsession with determining which candidate had “won the day” by making some clever quip at a press conference or getting some no-name senator to endorse them—things that 99 percent of voters did not care about. Political news, and especially the important news that really affects the campaign, proceeds at an irregular pace. But news coverage is ...more
10%
Flag icon
The FiveThirtyEight forecasting model started out pretty simple—basically, it took an average of polls but weighted them according to their past accuracy—then gradually became more intricate. But it abided by three broad principles, all of which are very fox-like.
10%
Flag icon
Principle 1: Think Probabilistically Almost all the forecasts that I publish, in politics and other fields, are probabilistic. Instead of spitting out just one number and claiming to know exactly what will happen, I instead articulate a range of possible outcomes.
10%
Flag icon
The wide distribution of outcomes represented the most honest expression of the uncertainty in the real world.
11%
Flag icon
How likely is a candidate to win, for instance, if he’s ahead by five points in the polls? This is the sort of question that FiveThirtyEight’s models are trying to address.
11%
Flag icon
The answer depends significantly on the type of race that he’s involved in. The further down the ballot you go, the more volatile the polls tend to be: polls of House races are less accurate than polls of Senate races, which are in turn less accurate than polls of presidential races. Polls of primaries, also, are considerably less accurate than general election polls. During the 2008 Democratic primaries, the average poll missed by about eight points, far more than implied by its margin of error. The problems in polls of the Republican primaries of 2012 may have been even worse.26 In many of ...more
11%
Flag icon
A Senate candidate with a five-point lead on the day before the election, for instance, has historically won his race about 95 percent of the time—almost a sure thing, even though news accounts are sure to describe the race as “too close to call.” By contrast, a five-point lead a year before the election translates to just a 59 percent chance of winning—barely better than a coin flip.
11%
Flag icon
The FiveThirtyEight models provide much of their value in this way. It’s very easy to look at an election, see that one candidate is ahead in all or most of the polls, and determine that he’s the favorite to win. (With some exceptions, this assumption will be correct.) What becomes much trickier is determining exactly how much of a favorite he is. Our brains, wired to detect patterns, are always looking for a signal, when instead we should appreciate how noisy the data is.
11%
Flag icon
In 2010, a Democratic congressman called me a few weeks in advance of the election. He represented a safely Democratic district on the West Coast. But given how well Republicans were doing that year, he was nevertheless concerned about losing his seat. What he wanted to know was exactly how much uncertainty there was in our forecast. Our numbers gave him, to the nearest approximation, a 100 percent chance of winning. But did 100 percent really mean 99 percent, or 99.99 percent, or 99.9999 percent? If the latter—a 1 in 100,000 chance of losing—he was prepared to donate his campaign funds to ...more
11%
Flag icon
Principle 2: Today’s Forecast Is the First Forecast of the Rest of Your Life Another misconception is that a good prediction shouldn’t change. Certainly, if there are wild gyrations in your forecast from day to day, that may be a bad sign—either of a badly designed model, or that the phenomenon you are attempting to predict isn’t very predictable at all.
11%
Flag icon
By the end of a presidential race, as many as thirty or forty polls might be released every day from different states, and some of these results will inevitably fall outside the margin of error. Candidates, strategists, and television commentators—who have some vested interest in making the race seem closer than it really is—might focus on these outlier polls, but the FiveThirtyEight model found that they usually didn’t make much difference. Ultimately, the right attitude is that you should make the best forecast possible today—regardless of what you said last week, last month, or last year. ...more
11%
Flag icon
It seems like cheating to change your mind—the equivalent of sticking your finger out and seeing which way the wind is blowing.29 The critiques usually rely, implicitly or explicitly, on the notion that politics is analogous to something like physics or biology, abiding by fundamental laws that are intrinsically knowable and predicable. (One of my most frequent critics is a professor of neuroscience at Princeton.30) Under those circumstances, new information doesn’t matter very much; elections should follow a predictable orbit, like a comet hurtling toward Earth. Instead of physics or biology, ...more
11%
Flag icon
Principle 3: Look for Consensus
11%
Flag icon
The expert consensus can be wrong—someone who had forecasted the collapse of the Soviet Union would have deserved most of the kudos that came to him. But the fantasy scenario is hugely unlikely. Even though foxes, myself included, aren’t really a conformist lot, we get worried anytime our forecasts differ radically from those being produced by our competitors. Quite a lot of evidence suggests that aggregate or group forecasts are more accurate than individual ones, often somewhere between 15 and 20 percent more accurate depending on the discipline. That doesn’t necessarily mean the group ...more
11%
Flag icon
Models that take a more fox-like approach, combining economic data with polling data and other types of information, have produced more reliable results.
11%
Flag icon
The failure of these magic-bullet forecasting models came even though they were quantitative, relying on published economic statistics. In fact, some of the very worst forecasts that I document in this book are quantitative.
11%
Flag icon
But hedgehogs can take any type of information and have it reinforce their biases, while foxes who have practice in weighing different types of information together can sometimes benefit from accounting for qualitative along with quantitative factors.
12%
Flag icon
House races are another matter, however. The candidates often rise from relative obscurity—city councilmen or small-business owners who decide to take their shot at national politics—and in some cases are barely known to voters until just days before the election. Congressional districts, meanwhile, are spread throughout literally every corner of the country, giving rise to any number of demographic idiosyncrasies.
12%
Flag icon
Other types of information are more qualitative, but are nonetheless potentially useful. Is the candidate a good public speaker? How in tune is her platform with the peculiarities of the district? What type of ads is she running? A political campaign is essentially a small business: How well does she manage people?
12%
Flag icon
Wasserman’s knowledge of the nooks and crannies of political geography can make him seem like a local, and Kapanke was happy to talk shop about the intricacies of his district—just how many voters he needed to win in La Crosse to make up for the ones he’d lose in Eau Claire. But he stumbled over a series of questions on allegations that he had used contributions from lobbyists to buy a new set of lights for the Loggers’ ballpark.40 It was small-bore stuff; it wasn’t like Kapanke had been accused of cheating on his wife or his taxes. But it was enough to dissuade Wasserman from changing the ...more
12%
Flag icon
Wasserman instead considers everything in the broader political context. A terrific Democratic candidate who aces her interview might not stand a chance in a district that the Republican normally wins by twenty points. So why bother with the candidate interviews at all? Mostly, Wasserman is looking for red flags—like the time when the Democratic congressman Eric Massa (who would later abruptly resign from Congress after accusations that he sexually harassed a male staffer) kept asking Wasserman how old he was. The psychologist Paul Meehl called these “broken leg” cases—situations where there ...more
12%
Flag icon
In this book, I use the terms objective and subjective carefully. The word objective is sometimes taken to be synonymous with quantitative, but it isn’t. Instead it means seeing beyond our personal biases and prejudices and toward the truth of a problem.
12%
Flag icon
When we make a forecast, we have a choice from among many different methods. Some of these might rely solely on quantitative variables like polls, while approaches like Wasserman’s may consider qualitative factors as well. All of them, however, introduce decisions and assumptions that have to be made by the forecaster. Wherever there is human judgment there is the potential for bias. The way to become more objective is to recognize the influence that our assumptions play in our forecasts and to question ourselves about them. In politics, between our ideological predispositions and our ...more
12%
Flag icon
You will need to learn how to express—and quantify—the uncertainty in your predictions. You will need to update your forecast as facts and circumstances change. You will need to recognize that there is wisdom in seeing the world from a different viewpoint. The more you are willing to do these things, the more capable you will be of evaluating a wide variety of information without abusing it. In short, you will need to learn how to think like a fox. The foxy forecaster recognizes the limitations that human judgment imposes in predicting the world’s course. Knowing those limits can help her to ...more
13%
Flag icon
By looking at statistics for thousands of players, James had discovered that the typical player9 continues to improve until he is in his late twenties, at which point his skills usually begin to atrophy, especially once he reaches his midthirties.10 This gave James one of his most important inventions: the aging curve. Olympic gymnasts peak in their teens; poets in their twenties; chess players in their thirties11; applied economists in their forties,12 and the average age of a Fortune 500 CEO is 55.13 A baseball player, James found, peaks at age twenty-seven. Of the fifty MVP winners between ...more
13%
Flag icon
But James’s aging curve painted too smooth a picture. Sure, the average player might peak at age twenty-seven. As anyone who has paid his dues staring at the backs of baseball cards can tell you, however, players age at different paces.
13%
Flag icon
Although Horner and Martinez may be exceptional cases, it is quite rare for players to follow the smooth patterns of development that the aging curve implies; instead, a sort of punctuated equilibrium of jagged peaks and valleys is the norm. Real aging curves are noisy—very noisy (figure 3-2). On average, they form a smooth-looking pattern. But the average, like the family with 1.7 children, is just a statistical abstraction.
16%
Flag icon
The organization still very much believes in rigorous analysis. The rigor and discipline is applied, however, in the way the organization processes the information it collects, and not in declaring certain types of information off-limits. “The proportion of objective versus subjective analysis is weighted more in some organizations than in others,” he explained. “From our standpoint in Oakland, we’re sort of forced into making objective decisions versus gut-feel decisions. If we in Oakland happen to be right on a gut-feel decision one time, my guess is it would be random. And we’re not in a ...more
18%
Flag icon
The idea takes on various forms, but no one took it further than Pierre-Simon Laplace, a French astronomer and mathematician. In 1814, Laplace made the following postulate, which later came to be known as Laplace’s Demon: We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect were also vast enough to submit these data to analysis, it would embrace in a single formula the movements of ...more
18%
Flag icon
At loggerheads with the determinists are the probabilists, who believe that the conditions of the universe are knowable only with some degree of uncertainty.* Probabilism was, at first, mostly an epistemological paradigm: it avowed that there were limits on man’s ability to come to grips with the universe. More recently, with the discovery of quantum mechanics, scientists and philosophers have asked whether the universe itself behaves probabilistically.
25%
Flag icon
But weather forecasters have a much better theoretical understanding of the earth’s atmosphere than seismologists do of the earth’s crust. They know, more or less, how weather works, right down to the molecular level. Seismologists don’t have that advantage. “It’s easy for climate systems,” Bowman reflected. “If they want to see what’s happening in the atmosphere, they just have to look up. We’re looking at rock. Most events occur at a depth of fifteen kilometers underground. We don’t have a hope of drilling down there, realistically—sci-fi movies aside. That’s the fundamental problem. There’s ...more
25%
Flag icon
What happens in systems with noisy data and underdeveloped theory—like earthquake prediction and parts of economics and political science—is a two-step process. First, people start to mistake the noise for a signal. Second, this noise pollutes journals, blogs, and news accounts with false alarms, undermining good science and setting back our ability to understand how the system really works.
25%
Flag icon
In almost all real-world applications, however, we have to work by induction, inferring the structure from the available evidence. You are most likely to overfit a model when the data is limited and noisy and when your understanding of the fundamental relationships is poor; both circumstances apply in earthquake forecasting. If we either don’t know or don’t care about the truth of the relationship, there are lots of reasons why we may be prone to overfitting the model.
25%
Flag icon
One is that the overfit model will score better according to most of the statistical tests that forecasters use.
25%
Flag icon
As obvious as this might seem when explained in this way, many forecasters completely ignore this problem. The wide array of statistical methods available to researchers enables them to be no less fanciful—and no more scientific—than a child finding animal patterns in clouds.
25%
Flag icon
Overfitting represents a double whammy: it makes our model look better on paper but perform worse in the real world. Because of the latter trait, an overfit model eventually will get its comeuppance if and when it is used to make real predictions. Because of the former, it may look superficially more impressive until then, claiming to make very accurate and newsworthy predictions and to represent an advance over previously applied techniques. This may make it easier to get the model published in an academic journal or to sell to a client, crowding out more honest models from the marketplace. ...more
26%
Flag icon
The Fukushima nuclear reactor was built to withstand a magnitude 8.6 earthquake,61 but not a 9.1. Archaeological evidence62 is suggestive of historic tsunamis on the scale of the 130-foot waves that the 2011 earthquake produced, but these cases were apparently forgotten or ignored. A magnitude 9.1 earthquake is an incredibly rare event in any part of the world: nobody should have been predicting it to the exact decade, let alone the exact date. In Japan, however, some scientists and central planners dismissed the possibility out of hand. This may reflect a case of overfitting.
26%
Flag icon
The data includes everything up through but not including the magnitude 9.1 earthquake on March 11. You’ll see that the relationship almost follows the straight-line pattern that Gutenberg and Richter’s method predicts. However, at about magnitude 7.5, there is a kink in the graph. There had been no earthquakes as large as a magnitude 8.0 in the region since 1964, and so the curve seems to bend down accordingly. So how to connect the dots? If you go strictly by the Gutenberg–Richter law, ignoring the kink in the graph, you should still follow the straight line, as in figure 5-7b. ...more
26%
Flag icon
Here is another example where an innocuous-seeming choice of assumptions will yield radically distinct conclusions—in this case, about the probability of a magnitude 9 earthquake in this part of Japan. The characteristic fit suggests that such an earthquake was nearly impossible—it implies that one might occur about every 13,000 years. The Gutenberg–Richter estimate, on the other hand, was that you’d get one such earthquake every three hundred years. That’s infrequent but hardly impossible—a tangible enough risk that a wealthy nation like Japan might be able to prepare for it.
26%
Flag icon
Because they occur so rarely, it will take centuries to know what the true rate of magnitude 9 earthquakes is. It will take even longer to know whether earthquakes larger than magnitude 9.5 are possible. Hough told me that there may be some fundamental constraints on earthquake size from the geography of fault systems. If the largest continuous string of faults in the world ruptured together—everything from Tierra Del Fuego at the southern tip of South America all the way up through the Aleutians in Alaska—a magnitude 10 is about what you’d get, she said. But it is hard to know for sure. Even ...more
26%
Flag icon
And yet complex processes produce order and beauty when you zoom out and look at them from enough distance. I use the terms signal and noise very loosely in this book, but they originally come from electrical engineering. There are different types of noise that engineers recognize—all of them are random, but they follow different underlying probability distributions. If you listen to true white noise, which is produced by random bursts of sound over a uniform distribution of frequencies, it is sibilant and somewhat abrasive. The type of noise associated with complex systems, called Brownian ...more
27%
Flag icon
A prediction of a forty-nine-foot crest in the river, expressed without any reservation, seemed to imply that the flood would hit forty-nine feet exactly; the fifty-one-foot levees would be just enough to keep them safe. Some residents even interpreted the forecast of forty-nine feet as representing the maximum possible extent of the flood.11 An oft-told joke: a statistician drowned crossing a river that was only three feet deep on average. On average, the flood might be forty-nine feet in the Weather Service’s forecast model, but just a little bit higher and the town would be inundated. The ...more
27%
Flag icon
This was a very bad forecast: GDP actually shrank by 3.3 percent once the financial crisis hit. What may be worse is that the economists were extremely confident in their bad prediction. They assigned only a 3 percent chance to the economy’s shrinking by any margin over the whole of 2008.15 And they gave it only about a 1-in-500 chance of shrinking by at least 2 percent, as it did.16 Indeed, economists have for a long time been much too confident in their ability to predict the direction of the economy. In figure 6-4, I’ve plotted the forecasts of GDP growth from the Survey of Professional ...more
28%
Flag icon
In the year after the stimulus package was passed in 2009, for instance, GDP was growing fast enough to create about two million jobs according to Okun’s law.41 Instead, an additional 3.5 million jobs were lost during the period. Economists often debate about what the change means. The most pessimistic interpretation, advanced by economists including Jeffrey Sachs of Columbia University, is that the pattern reflects profound structural problems in the American economy: among them, increasing competition from other countries, an imbalance between the service and manufacturing sectors, an aging ...more
29%
Flag icon
If you make the same plot for the years 1986 through 2006 (as in figure 6-5b), you’ll find just the reverse. Most of the data points—both the forecasted values for GDP and the actual ones—are bunched closely together in a narrow range between about 2 percent and 5 percent annual growth. Because there was so little volatility during this time, the average error in the forecast was less than in the previous period.* However, to the extent there was any variability in the economy, like the mild recessions of 1990–91 or in 2001, the forecasts weren’t doing a very good job of capturing it—in fact, ...more
29%
Flag icon
The challenge to economists might be compared to the one faced by weather forecasters. They face two of the same fundamental problems. First, the economy, like the atmosphere, is a dynamic system: everything affects everything else and the systems are perpetually in motion. In meteorology, this problem is quite literal, since the weather is subject to chaos theory—a butterfly flapping its wings in Brazil can theoretically cause a tornado in Texas. But in loosely the same way, a tsunami in Japan or a longshoreman’s strike in Long Beach can affect whether someone in Texas finds a job. Second, ...more
This highlight has been truncated due to consecutive passage length restrictions.