The Signal and the Noise: Why So Many Predictions Fail-but Some Don't
Rate it:
Open Preview
29%
Flag icon
Hatzius wrote on November 15, 2007: The likely mortgage credit losses pose a significantly bigger macroeconomic risk than generally recognized. . . . The macroeconomic consequences could be quite dramatic. If leveraged investors see $200 [billion in] aggregate credit loss, they might need to scale back their lending by $2 trillion. This is a large shock. . . . It is easy to see how such a shock could produce a substantial recession or a long period of very sluggish growth. Consumers had been extended too much credit, Hatzius wrote, to pay for homes that the housing bubble had made ...more
31%
Flag icon
H1N1 had been responsible for the worst pandemic in modern history: the Spanish flu of 1918–20, which afflicted a third of humanity and killed 50 million,3 including 675,000 in the United States. For reasons of both science and superstition, the disclosure sent a chill through the nation’s epidemiological community. The 1918 outbreak’s earliest manifestations had also come at a military base, Fort Riley in Kansas, where soldiers were busy preparing to enter World War I.4 Moreover, there was a belief at that time—based on somewhat flimsy scientific evidence—that a major flu epidemic manifested ...more
This highlight has been truncated due to consecutive passage length restrictions.
31%
Flag icon
The press portrayed the mass vaccination program as a gamble.9 But Ford thought of it as a gamble between money and lives, and one that he was on the right side of. Overwhelming majorities in both houses of Congress approved his plans at a cost of $180 million.10 By summer, however, there were serious doubts about the government’s plans. Although summer is the natural low season for the flu in the United States,11 it was winter in the Southern Hemisphere, when flu is normally at its peak. And nowhere, from Auckland to Argentina, were there any signs of H1N1; instead, the mild and common ...more
31%
Flag icon
Their fear, however, manifested itself as much toward the vaccine as toward the disease itself. Throughout American history, the notion of the government poking needles into everyone’s arm has always provoked more than its fair share of anxiety. But this time there was a more tangible basis for public doubt. In August of that year, under pressure from the drug companies, Congress and the White House had agreed to indemnify them from legal liability in the event of manufacturing defects. This was widely read as a vote of no-confidence; the vaccine looked as though it was being rushed out ...more
31%
Flag icon
By late fall, another problem had emerged, this one far more serious. About five hundred patients, after receiving their shots, had begun to exhibit the symptoms of a rare neurological condition known as Guillain–Barré syndrome, an autoimmune disorder that can cause paralysis. This time, the statistical evidence was far more convincing: the usual incidence of Guillain–Barré in the general population is only about one case per million persons.22 In contrast, the rate in the vaccinated population had been ten times that—five hundred cases out of the roughly fifty million people who had been ...more
31%
Flag icon
Within a couple of years, the number of Americans willing to take flu shots dwindled to only about one million,29 potentially putting the nation in grave danger had a severe strain hit in 1978 or 1979.30 Ford’s handling of H1N1 was irresponsible on a number of levels. By invoking the likelihood of a 1918-type pandemic, he had gone against the advice of medical experts, who believed at the time that the chance of such a worst-case outcome was no higher than 35 percent and perhaps as low as 2 percent.31 Still, it was not clear what had caused H1N1 to disappear just as suddenly as it emerged. And ...more
31%
Flag icon
The perfect incubator for the swine flu, then, would be a region in which each of three conditions held: It would be a place where humans and pigs lived in close proximity—that is, somewhere where pork was a staple of the diet. It would be a place near the ocean where pigs and seafaring birds might intermingle. And it would probably be somewhere in the developing world, where poverty produced lower levels of hygiene and sanitation, allowing animal viruses to be transmitted to humans more easily. This mix almost perfectly describes the conditions found in Southeast Asian countries like China, ...more
Kenneth Bernoska
But. It can start anywhere. At any time.
31%
Flag icon
These circumstances are not exclusive to Asia, however. The Mexican state of Veracruz, for instance, provides similarly fertile conditions for the flu. Veracruz has a coastline on the Gulf of Mexico, and Mexico is a developing country with a culinary tradition that heavily features pork.35 It was in Veracruz—where very few scientists were looking for the flu36—that the 2009 outbreak of H1N1 began.
32%
Flag icon
There were reports of about 1,900 cases of H1N1 in Mexico and some 150 deaths. The ratio of these two quantities is known as the case fatality rate and it was seemingly very high—about 8 percent of the people who had acquired the flu had apparently died from it, which exceeded the rate during the Spanish flu epidemic.38 Many of the dead, moreover, were relatively young and healthy adults, another characteristic of severe outbreaks. And the virus was clearly quite good at reproducing itself; cases had already been detected in Canada, Spain, Peru, the United Kingdom, Israel, New Zealand, ...more
32%
Flag icon
Swine flu had indeed spread extremely rapidly in the United States—from twenty confirmed cases on April 26 to 2,618 some fifteen days later.41 But most cases were surprisingly mild, with just three deaths confirmed in the United States, a fatality rate comparable to the seasonal flu. Just a week after the swine flu had seemed to have boundless destructive potential, the CDC recommended that closed schools be reopened. The disease had continued to spread across the globe, however, and by June 2009 the WHO had declared it a level 6 pandemic, its highest classification. Scientists feared the ...more
32%
Flag icon
Eventually, the government reported that a total of about fifty-five million Americans had become infected with H1N1 in 2009—about one sixth of the U.S. population rather than one half—and 11,000 had died from it.43 Rather than being an unusually severe strain of the virus, H1N1 had in fact been exceptionally mild, with a fatality rate of just 0.02 percent. Indeed, there were slightly fewer deaths from the flu in 2009–10 than in a typical season.44 It hadn’t quite been the epic embarrassment of 1976, but there had been failures of prediction from start to finish. There are no guarantees that ...more
32%
Flag icon
Extrapolation tends to cause its greatest problems in fields—including population growth and disease—where the quantity that you want to study is growing exponentially.
32%
Flag icon
Perhaps the bigger problem from a statistical standpoint, however, is that precise predictions aren’t really possible to begin with when you are extrapolating on an exponential scale. A properly applied version53 of this method, which accounted for its margin of error, would have implied that there could be as few as 35,000 AIDS cases through 1995 or as many as 1.8 million. That’s much too broad a range to provide for much in the way of predictive insight.
32%
Flag icon
Although the statistical methods that epidemiologists use when a flu outbreak is first detected are not quite as simple as the preceding examples, they still face the challenge of making extrapolations from a small number of potentially dubious data points. One of the most useful quantities for predicting disease spread is a variable called the basic reproduction number. Usually designated as R0, it measures the number of uninfected people that can expect to catch a disease from a single infected individual. An R0 of 4, for instance, means that—in the absence of vaccines or other preventative ...more
32%
Flag icon
The problem is that reliable estimates of R0 can usually not be formulated until well after a disease has swept through a community and there has been sufficient time to scrutinize the statistics. So epidemiologists are forced to make extrapolations about it from a few early data points. The other key statistical measure of a disease, the fatality rate, can similarly be difficult to measure accurately in the early going. It is a catch-22; a disease cannot be predicted very accurately without this information, but reliable estimates of these quantities are usually not available until the ...more
32%
Flag icon
The case fatality rate is a simple ratio: the number of deaths caused by a disease divided by the number of cases attributed to it. But there are uncertainties on both sides of the equation.
33%
Flag icon
Diseases and other medical conditions can also have this self-fulfilling property. When medical conditions are widely discussed in the media, people are more likely to identify their symptoms, and doctors are more likely to diagnose (or misdiagnose) them. The best-known case of this in recent years is autism. If you compare the number of children who are diagnosed as autistic64 to the frequency with which the term autism has been used in American newspapers,65 you’ll find that there is an almost perfect one-to-one correspondence (figure 7-4), with both having increased markedly in recent ...more
33%
Flag icon
The Finnish scientist Hanna Kokko likens building a statistical or predictive model to drawing a map.68 It needs to contain enough detail to be helpful and do an honest job of representing the underlying landscape—you don’t want to leave out large cities, prominent rivers and mountain ranges, or major highways. Too much detail, however, can be overwhelming to the traveler, causing him to lose his way. As we saw in chapter 5 these problems are not purely aesthetic. Needlessly complicated models may fit the noise in a problem rather than the signal, doing a poor job of replicating its underlying ...more
33%
Flag icon
The most basic mathematical treatment of infectious disease is called the SIR model (figure 7-5). The model, which was formulated in 1927,69 posits that there are three “compartments” in which any given person might reside at any given time: S stands for being susceptible to a disease, I for being infected by it, and R for being recovered from it. For simple diseases like the flu, the movement from compartment to compartment is entirely in one direction: from S to I to R. In this model, a vaccination essentially serves as a shortcut,* allowing a person to progress from S to R without getting ...more
33%
Flag icon
The problem is that the model requires a lot of assumptions to work properly, some of which are not very realistic in practice. In particular, the model assumes that everybody in a given population behaves the same way—that they are equally susceptible to a disease, equally likely to be vaccinated for it, and that they intermingle with one another at random. There are no dividing lines by race, gender, age, religion, sexual orientation, or creed; everybody behaves in more or less the same way.
34%
Flag icon
The last two major flu scares in the United States proved not to live up to the hype. In 1976, there was literally no outbreak of H1N1 beyond the cases at Fort Dix; Ford’s mass vaccination program had been a gross overreaction. In 2009, the swine flu infected quite a number of people but killed very few of them. In both instances, government predictions about the magnitude of the outbreak had missed to the high side. But there are no guarantees the error will be in the same direction the next time the flu comes along. A human-adapted strain of avian flu, H5N1 could kill hundreds of millions of ...more
34%
Flag icon
“It’s stupid to predict based on three data points,” Marc Lipsitch told me, referring to the flu pandemics in 1918, 1957, and 1968. “All you can do is plan for different scenarios.” If you can’t make a good prediction, it is very often harmful to pretend that you can. I suspect that epidemiologists, and others in the medical community, understand this because of their adherence to the Hippocratic oath. Primum non nocere: First, do no harm. Much of the most thoughtful work on the use and abuse of statistical models and the proper role of prediction comes from people in the medical profession.
35%
Flag icon
But because of medicine’s intimate connection with life and death, doctors tend to be appropriately cautious. In their field, stupid models kill people. It has a sobering effect.
35%
Flag icon
George E. P. Box wrote, “All models are wrong, but some models are useful.”90 What he meant by that is that all models are simplifications of the universe, as they must necessarily be. As another mathematician said, “The best model of a cat is a cat.”91 Everything else is leaving out some sort of detail. How pertinent that detail might be will depend on exactly what problem we’re trying to solve and on how precise an answer we require.
35%
Flag icon
Language, for instance, is a type of model, an approximation that we use to communicate with one another. All languages contain words that have no direct cognate in other languages, even though they are both trying to explain the same universe. Technical subfields have their own specialized language. To you or me, the color of some of the text on the front cover of this book is yellow. To a graphic designer, that term is too approximate—instead, it’s Pantone 109. But, Box wrote, some models are useful. It seems to me that the work the Chicago or Pittsburgh teams are doing with their ...more
35%
Flag icon
Some neuroscientists, like MIT’s Tomasso Poggio, think of the entire way our brains process information as being through a series of approximations. This is why it is so crucial to develop a better understanding of ourselves, and the way we distort and interpret the signals we receive, if we want to make better predictions. The first half of this book has largely been concerned with where these approximations have been serving us well and where they’ve been failing us.
36%
Flag icon
How did Voulgaris know that his Lakers bet would come through? He didn’t. Successful gamblers—and successful forecasters of any kind—do not think of the future in terms of no-lose bets, unimpeachable theories, and infinitely precise measurements. These are the illusions of the sucker, the sirens of his overconfidence. Successful gamblers, instead, think of the future as speckles of probability, flickering upward and downward like a stock market ticker to every new jolt of information. When their estimates of these probabilities diverge by a sufficient margin from the odds on offer, they may ...more
36%
Flag icon
Thomas Bayes was an English minister who was probably born in 1701—although it may have been 1702. Very little is certain about Bayes’s life, even though he lent his name to an entire branch of statistics and perhaps its most famous theorem.
36%
Flag icon
Bayes considered the age-old theological question of how there could be suffering and evil in the world if God was truly benevolent. Bayes’s answer, in essence, was that we should not mistake our human imperfections for imperfections on the part of God, whose designs for the universe we might not fully understand. “Strange therefore . . . because he only sees the lowest part of this scale, [he] should from hence infer a defeat of happiness in the whole,” Bayes wrote in response to another theologian.23 Bayes’s much more famous work, “An Essay toward Solving a Problem in the Doctrine of ...more
36%
Flag icon
Bayes’s essay, gives the example of a person who emerges into the world (perhaps he is Adam, or perhaps he came from Plato’s cave) and sees the sun rise for the first time. At first, he does not know whether this is typical or some sort of freak occurrence. However, each day that he survives and the sun rises again, his confidence increases that it is a permanent feature of nature. Gradually, through this purely statistical form of inference, the probability he assigns to his prediction that the sun will rise again tomorrow approaches (although never exactly reaches) 100 percent. The argument ...more
36%
Flag icon
Scottish philosopher David Hume, who argued that since we could not be certain that the sun would rise again, a prediction that it would was inherently no more rational than one that it wouldn’t.26 The Bayesian viewpoint, instead, regards rationality as a probabilistic matter. In essence, Bayes and Price are telling Hume, don’t blame nature because you are too daft to understand it: if you step out of your skeptical shell and make some predictions about its behavior, perhaps you will get a little closer to the truth.
36%
Flag icon
We might notice how similar this claim is to the one that Bayes made in “Divine Benevolence,” in which he argued that we should not confuse our own fallibility for the failures of God. Admitting to our own imperfections is a necessary step on the way to redemption.
37%
Flag icon
The intimate connection between probability, prediction, and scientific progress was thus well understood by Bayes and Laplace in the eighteenth century—the period when human societies were beginning to take the explosion of information that had become available with the invention of the printing press several centuries earlier, and finally translate it into sustained scientific, technological, and economic progress. The connection is essential—equally to predicting the orbits of the planets and the winner of the Lakers’ game. As we will see, science may have stumbled later when a different ...more
37%
Flag icon
When we fail to think like Bayesians, false positives are a problem not just for mammograms but for all of science. In the introduction to this book, I noted the work of the medical researcher John P. A. Ioannidis. In 2005, Ioannidis published an influential paper, “Why Most Published Research Findings Are False,”40 in which he cited a variety of statistical and theoretical arguments to claim that (as his title implies) the majority of hypotheses deemed to be true in journals in medicine and most other academic and scientific professions are, in fact, false. Ioannidis’s hypothesis, as we ...more
37%
Flag icon
Most of the data is just noise, as most of the universe is filled with empty space. Meanwhile, as we know from Bayes’s theorem, when the underlying incidence of something in a population is low (breast cancer in young women; truth in the sea of data), false positives can dominate the results if we are not careful. Figure 8-6 represents this graphically. In the figure, 80 percent of true scientific hypotheses are correctly deemed to be true, and about 90 percent of false hypotheses are correctly rejected. And yet, because true findings are so rare, about two-thirds of the findings deemed to be ...more
38%
Flag icon
But perhaps the bigger problem is the way that Fisher’s statistical philosophy tends to conceive of the world. It emphasizes the objective purity of the experiment—every hypothesis could be tested to a perfect conclusion if only enough data were collected. However, in order to achieve that purity, it denies the need for Bayesian priors or any other sort of messy real-world context. These methods neither require nor encourage us to think about the plausibility of our hypothesis: the idea that cigarettes cause lung cancer competes on a level playing field with the idea that toads predict ...more
39%
Flag icon
As an empirical matter, we all have beliefs and biases, forged from some combination of our experiences, our values, our knowledge, and perhaps our political or professional agenda. One of the nice characteristics of the Bayesian perspective is that, in explicitly acknowledging that we have prior beliefs that affect how we interpret new evidence, it provides for a very good description of how we react to the changes in our world. For instance, if Fisher’s prior belief was that there was just a 0.00001 percent chance that cigarettes cause lung cancer, that helps explain why all the evidence to ...more
39%
Flag icon
Absolutely nothing useful is realized when one person who holds that there is a 0 percent probability of something argues against another person who holds that the probability is 100 percent. Many wars—like the sectarian wars in Europe in the early days of the printing press—probably result from something like this premise.
39%
Flag icon
This does not imply that all prior beliefs are equally correct or equally valid. But I’m of the view that we can never achieve perfect objectivity, rationality, or accuracy in our beliefs. Instead, we can striv...
This highlight has been truncated due to consecutive passage length restrictions.
39%
Flag icon
In theory, science should work this way. The notion of scientific consensus is tricky, but the idea is that the opinion of the scientific community converges toward the truth as ideas are debated and new evidence is uncovered. Just as in the stock market, the steps are not always forward or smooth. The scientific community is often too conservative about adapting its paradigms to new evidence,64 although there have certainly also been times when it was too quick to jump on the bandwagon. Still, provided that everyone is on the Bayesian train,* even incorrect beliefs and quite wrong priors are ...more
40%
Flag icon
In the beginning of a chess game the center of the board is void, with pawns, rooks, and bishops neatly aligned in the first two rows awaiting instructions from their masters. The possibilities are almost infinite. White can open the game in any of twenty different ways, and black can respond with twenty of its own moves, creating 4,000 possible sequences after the first full turn. After the second full turn, there are 71,852 possibilities; after the third, there are 9,132,484. The number of possibilities in an entire chess game, played to completion, is so large that it is a significant ...more
40%
Flag icon
Moreover, because the opening moves are more routine to players than positions they may encounter later on, humans can rely on centuries’ worth of experience to pick the best moves. Although there are theoretically twenty moves that white might play to open the game, more than 98 percent of competitive chess games begin with one of the best four.
40%
Flag icon
Chess databases contain the results of literally hundreds of thousands of games and like any other database can be mined for predictive insight. IBM’s programmers studied things like how often each sequence of opening moves had been played and how strong the players were who played them, as well as how often each series of moves resulted in wins, losses, and draws for their respective sides.20 The computer’s heuristics for analyzing these statistics could potentially put it on a level playing field with human intuition and experience, if not well ahead of it. “Kasparov isn’t playing a ...more
40%
Flag icon
Even when very common chess moves are played, there are so many possible branches on the tree that databases are useless after perhaps ten or fifteen moves. In any long game of chess, it is quite likely that you and your opponent will eventually reach some positions that literally no two players in the history of humanity have encountered before. But Kasparov had taken the database out after just three moves. As we have learned throughout this book, purely statistical approaches toward forecasting are ineffective at best when there is not a sufficient sample of data to work with.
41%
Flag icon
This is what separates elite players from amateurs. In his famous study of chess players, the Dutch psychologist Adriaan de Groot found that amateur players, when presented with a chess problem, often frustrated themselves by looking for the perfect move, rendering themselves incapable of making any move at all.26 Chess masters, by contrast, are looking for a good move—and certainly if at all possible the best move in a given position—but they are more forecasting how the move might favorably dispose their position than trying to enumerate every possibility. It is “pure fantasy,” the American ...more
41%
Flag icon
Chess players learn through memory and experience where to concentrate their thinking. Sometimes this involves probing many branches of the tree but just a couple of moves down the line; at other times, they focus on just one branch but carry out the calculation to a much greater depth. This type of trade-off between breadth and depth is common anytime that we face a complicated problem. The Defense Department and the CIA, for instance, must decide whether to follow up on a broader array of signals in predicting and preventing potential terrorist attacks, or instead to focus on those ...more
41%
Flag icon
Computer chess machines, to some extent, get to have it both ways. They use heuristics to prune their search trees, focusing more of their processing power on more promising branches rather than calculating every one to the same degree of depth. But because they are so much faster at calculation, they don’t have to compromise as much, evaluating all the possibilities a little bit and the most important-seeming ones in greater detail. But computer chess programs can’t always see the bigger picture and think strategically. They are very good at calculating the tactics to achieve some near-term ...more
41%
Flag icon
Literally all positions in which there are six or fewer pieces on the board have been solved to completion. Work on seven-piece positions is mostly complete—some of the solutions are intricate enough to require as many as 517 moves—but computers have memorized exactly which are the winning, losing, and drawing ones.
41%
Flag icon
To see twenty moves ahead in a game as complex as chess was once thought to be impossible for both human beings and computers. Kasparov’s proudest moment, he once claimed, had come in a match in the Netherlands in 1999, when he had visualized a winning position some fifteen moves in advance.34 Deep Blue was thought to be limited to a range of six to eight moves ahead in most cases. Kasparov and Friedel were not exactly sure what was going on, but what had seemed to casual observers like a random and inexplicable blunder instead seemed to them to reveal great wisdom. Kasparov would never defeat ...more
42%
Flag icon
Kasparov, likewise, seemed to think Deep Blue’s circuitry had been supplemented with a superior intelligence. Kasparov’s two theories about Deep Blue’s behavior were, of course, mutually contradictory—as Edgar Allan Poe’s conceptions of the Mechanical Turk had been. The machine was playing much too well to possibly be a computer—or the machine had an intelligence so vast that no human had any hope of comprehending it.