Superforecasting: The Art and Science of Prediction
Rate it:
Open Preview
Read between November 12 - November 28, 2021
3%
Flag icon
In a world where a butterfly in Brazil can make the difference between just another sunny day in Texas and a tornado tearing through a town, it’s misguided to think anyone can see very far into the future.
4%
Flag icon
separating the predictable from the unpredictable is difficult work. There’s no way around it. Meteorologists know that better than anyone. They make large numbers of forecasts and routinely check their accuracy—which is why we know that one-and two-day forecasts are typically quite accurate while eight-day forecasts are not. With these analyses, meteorologists are able to sharpen their understanding of how weather works and tweak their models. Then they try again. Forecast, measure, revise. Repeat. It’s a never-ending process of incremental improvement that explains why weather forecasts are ...more
8%
Flag icon
Galen never conducted anything resembling a modern experiment. Why should he? Experiments are what people do when they aren’t sure what the truth is. And Galen was untroubled by doubt. Each outcome confirmed he was right, no matter how equivocal the evidence might look to someone less wise than the master. “All who drink of this treatment recover in a short time, except those whom it does not help, who all die,” he wrote. “It is obvious, therefore, that it fails only in incurable cases.”
8%
Flag icon
Not until the twentieth century did the idea of randomized trial experiments, careful measurement, and statistical power take hold. “Is the application of the numerical method to the subject-matter of medicine a trivial and time-wasting ingenuity as some hold, or is it an important stage in the development of our art, as others proclaim it,” the Lancet asked in 1921. The British statistician Austin Bradford Hill responded emphatically that it was the latter, and laid out a template for modern medical investigation.
9%
Flag icon
The idea of randomized controlled trials was painfully slow to catch on and it was only after World War II that the first serious trials were attempted.
9%
Flag icon
Cochrane got his trial: some patients, randomly selected, were sent to the cardiac care units while others were sent home for monitoring and bed rest. Partway through the trial, Cochrane met with a group of the cardiologists who had tried to stop his experiment. He told them that he had preliminary results. The difference in outcomes between the two treatments was not statistically significant, he emphasized, but it appeared that patients might do slightly better in the cardiac care units. “They were vociferous in their abuse: ‘Archie,’ they said, ‘we always thought you were unethical. You ...more
9%
Flag icon
Cochrane cited the Thatcher government’s “short, sharp, shock” approach to young offenders, which called for brief incarceration in spartan jails governed by strict rules. Did it work? The government had simply implemented it throughout the justice system, making it impossible to answer. If the policy was introduced and crime went down, that might mean the policy worked, or perhaps crime went down for any of a hundred other possible reasons. If crime went up, that might show the policy was useless or even harmful, or it might mean crime would have risen even more but for the beneficial effects ...more
10%
Flag icon
It is natural to identify our thinking with the ideas, images, plans, and feelings that flow through consciousness. What else could it be? If I ask, “Why did you buy that car?” you can trot out reasons: “Good mileage. Cute style. Great price.” But you can only share thoughts by introspecting; that is, by turning your attention inward and examining the contents of your mind. And introspection can only capture a tiny fraction of the complex processes whirling inside your head—and behind your decisions.
10%
Flag icon
A defining feature of intuitive judgment is its insensitivity to the quality of the evidence on which the judgment is based. It has to be that way. System 1 can only do its job of delivering strong conclusions at lightning speed if it never pauses to wonder whether the evidence at hand is flawed or inadequate, or if there is better evidence elsewhere. It must treat the available evidence as reliable and sufficient. These tacit assumptions are so vital to System 1 that Kahneman gave them an ungainly but oddly memorable label: WYSIATI (What You See Is All There Is).
11%
Flag icon
This compulsion to explain arises with clocklike regularity every time a stock market closes and a journalist says something like “The Dow rose ninety-five points today on news that …” A quick check will often reveal that the news that supposedly drove the market came out well after the market had risen. But that minimal level of scrutiny is seldom applied. It’s a rare day when a journalist says, “The market rose today for any one of a hundred different reasons, or a mix of them, so no one knows.”
11%
Flag icon
The explanatory urge is mostly a good thing. Indeed, it is the propulsive force behind all human efforts to comprehend reality. The problem is that we move too fast from confusion and uncertainty (“I have no idea why my hand is pointed at a picture of a shovel”) to a clear and confident conclusion (“Oh, that’s simple”) without spending any time in between (“This is one possible explanation but there are others”).
12%
Flag icon
Formally, it’s called attribute substitution, but I call it bait and switch: when faced with a hard question, we often surreptitiously replace it with an easy one. “Should I worry about the shadow in the long grass?” is a hard question. Without more data, it may be unanswerable. So we substitute an easier question: “Can I easily recall a lion attacking someone from the long grass?” That question becomes a proxy for the original question and if the answer is yes to the second question, the answer to the first also becomes yes.
12%
Flag icon
If something doesn’t fit a pattern—like a kitchen fire giving off more heat than a kitchen fire should—a competent expert senses it immediately. But as we see every time someone spots the Virgin Mary in burnt toast or in mold on a church wall, our pattern-recognition ability comes at the cost of susceptibility to false positives.
13%
Flag icon
All too often, forecasting in the twenty-first century looks too much like nineteenth-century medicine. There are theories, assertions, and arguments. There are famous figures, as confident as they are well compensated. But there is little experimentation, or anything that could be called science, so we know much less than most people realize. And we pay the price. Although bad forecasting rarely leads as obviously to harm as does bad medicine, it steers us subtly toward bad decisions and all that flows from them—including monetary losses, missed opportunities, unnecessary suffering, even war ...more
15%
Flag icon
No one had ever seriously tested the forecasting accuracy of political experts and the more I pondered the challenge, the more I realized why. Take the problem of timelines. Obviously, a forecast without a time frame is absurd. And yet, forecasters routinely make them, as they did in that letter to Ben Bernanke. They’re not being dishonest, at least not usually. Rather, they’re relying on a shared implicit understanding, however rough, of the timeline they have in mind. That’s why forecasts without timelines don’t appear absurd when they are made. But as time passes, memories fade, and tacit ...more
15%
Flag icon
Similarly, forecasts often rely on implicit understandings of key terms rather than explicit definitions—like “significant market share” in Steve Ballmer’s forecast. This sort of vague verbiage is more the rule than the exception.
16%
Flag icon
Kent was chatting with a senior State Department official who casually asked, “By the way, what did you people mean by the expression ‘serious possibility’? What kind of odds did you have in mind?” Kent said he was pessimistic. He felt the odds were about 65 to 35 in favor of an attack. The official was startled. He and his colleagues had taken “serious possibility” to mean much lower odds.10 Disturbed, Kent went back to his team. They had all agreed to use “serious possibility” in the NIE so Kent asked each person, in turn, what he thought it meant. One analyst said it meant odds of about 80 ...more
16%
Flag icon
In 1961, when the CIA was planning to topple the Castro government by landing a small army of Cuban expatriates at the Bay of Pigs, President John F. Kennedy turned to the military for an unbiased assessment. The Joint Chiefs of Staff concluded that the plan had a “fair chance” of success. The man who wrote the words “fair chance” later said he had in mind odds of 3 to 1 against success. But Kennedy was never told precisely what “fair chance” meant and, not unreasonably, he took it to be a much more positive assessment.
16%
Flag icon
A more serious objection—then and now—is that expressing a probability estimate with a number may imply to the reader that it is an objective fact, not the subjective judgment it is. That is a danger. But the answer is not to do away with numbers. It’s to inform readers that numbers, just like words, only express estimates—opinions—and nothing more. Similarly, it might be argued that the precision of a number implicitly says “the forecaster knows with exactitude that this number is right.” But that’s not intended and shouldn’t be inferred.
16%
Flag icon
a more fundamental obstacle to adopting numbers relates to accountability and what I call the wrong-side-of-maybe fallacy. If a meteorologist says there is a 70% chance of rain and it doesn’t rain, is she wrong? Not necessarily. Implicitly, her forecast also says there is a 30% chance it will not rain. So if it doesn’t rain, her forecast may have been off, or she may have been exactly right. It’s not possible to judge with only that one forecast in hand. The only way to know for sure would be to rerun the day hundreds of times. If it rained in 70% of those reruns, and didn’t rain in 30%, she ...more
17%
Flag icon
The prevalence of this elementary error has a terrible consequence. Consider that if an intelligence agency says there is a 65% chance that an event will happen, it risks being pilloried if it does not—and because the forecast itself says there is a 35% chance it will not happen, that’s a big risk. So what’s the safe thing to do? Stick with elastic language. Forecasters who use “a fair chance” and “a serious possibility” can even make the wrong-side-of-maybe fallacy work for them: If the event happens, “a fair chance” can retroactively be stretched to mean something considerably bigger than ...more
17%
Flag icon
When CIA analysts told President Obama they were “70%” or “90%” sure the mystery man in a Pakistani compound was Osama bin Laden, it was a small, posthumous triumph for Sherman Kent.
17%
Flag icon
We cannot rerun history so we cannot judge one probabilistic forecast—but everything changes when we have many probabilistic forecasts. If a meteorologist says there is a 70% chance of rain tomorrow, that forecast cannot be judged, but if she predicts the weather tomorrow, and the day after, and the day after that, for months, her forecasts can be tabulated and her track record determined. If her forecasting is perfect, rain happens 70% of the time when she says there is 70% chance of rain, 30% of the time when she says there is 30% chance of rain, and so on. This is called calibration. It can ...more
17%
Flag icon
Important as calibration is, it’s not the whole story because “perfect calibration” isn’t what we think of when we imagine perfect forecasting accuracy. Perfection is godlike omniscience. It’s saying “this will happen” and it does, or “this won’t happen” and it doesn’t. The technical term for this is “resolution.” The two figures here show how calibration and resolution capture distinct facets of good judgment. The figure on top represents perfect calibration but poor resolution. It’s perfect calibration because when the forecaster says there is a 40% chance something will happen, it happens ...more
18%
Flag icon
The math behind this system was developed by Glenn W. Brier in 1950, hence results are called Brier scores. In effect, Brier scores measure the distance between what you forecast and what actually happened. So Brier scores are like golf scores: lower is better. Perfection is 0. A hedged fifty-fifty call, or random guessing in the aggregate, will produce a Brier score of 0.5. A forecast that is wrong to the greatest possible extent—saying there is a 100% chance that something will happen and it doesn’t, every time—scores a disastrous 2.0, as far from The Truth as it is possible to get.17
18%
Flag icon
Let’s suppose we discover that you have a Brier score of 0.2. That’s far from godlike omniscience (0) but a lot better than chimp-like guessing (0.5), so it falls in the range of what one might expect from, say, a human being. But we can say much more than that. What a Brier score means depends on what’s being forecast. For instance, it’s quite easy to imagine circumstances where a Brier score of 0.2 would be disappointing. Consider the weather in Phoenix, Arizona. Each June, it gets very hot and sunny. A forecaster who followed a mindless rule like, “always assign 100% to hot and sunny” could ...more
18%
Flag icon
Another key benchmark is other forecasters. Who can beat everyone else? Who can beat the consensus forecast? How do they pull it off? Answering these questions requires comparing Brier scores, which, in turn, requires a level playing field.
19%
Flag icon
I began the experiment when Mikhail Gorbachev and the Soviet Politburo were key players shaping the fate of the world; by the time I started to write up the results, the USSR existed only on historical maps and Gorbachev was doing commercials for Pizza Hut. The final results appeared in 2005—twenty-one years, six presidential elections, and three wars after I sat on the National Research Council panel that got me thinking about forecasting.
31%
Flag icon
In many physics and engineering faculties, Fermi estimates or Fermi problems—strange tests like “estimate the number of square inches of pizza consumed by all the students at the University of Maryland during one semester”—are part of the curriculum. I shared Levitin’s discussion of Fermi estimation with a group of superforecasters and it drew a chorus of approval. Sandy Sillman told me Fermi estimation was so critical to his job as a scientist working with atmospheric models that
35%
Flag icon
For superforecasters, beliefs are hypotheses to be tested, not treasures to be guarded. It would be facile to reduce superforecasting to a bumper-sticker slogan, but if I had to, that would be it.
36%
Flag icon
Even if they all looked at the same evidence—and there’s likely to be some variation—it is unlikely they would all reach precisely the same conclusion. They are different people. They have different educations, training, experiences, and personalities. A smart executive will not expect universal agreement, and will treat its appearance as a warning flag that groupthink has taken hold. An array of judgments is welcome proof that the people around the table are actually thinking for themselves and offering their unique perspectives.
37%
Flag icon
Bowden’s account reminded me of an offhand remark that Amos Tversky made some thirty years ago, when we served on that National Research Council committee charged with preventing nuclear war. In dealing with probabilities, he said, most people only have three settings: “gonna happen,” “not gonna happen,” and “maybe.” Amos had an impish sense of humor. He also appreciated the absurdity of an academic committee on a mission to save the world. So I am 98% sure he was joking. And 99% sure his joke captures a basic truth about human judgment.
37%
Flag icon
Human beings have coped with uncertainty for as long as we have been recognizably human. And for almost all that time we didn’t have access to statistical models of uncertainty because they didn’t exist. It was remarkably late in history—arguably as late as the 1713 publication of Jakob Bernoulli’s Ars Conjectandi—before the best minds started to think seriously about probability.
37%
Flag icon
If the response is strong enough, it can produce a binary conclusion: “Yes, it’s a lion,” or “No, it’s not a lion.” But if it’s weaker, it can produce an unsettling middle possibility: “Maybe it’s a lion.” What the tip-of-your-nose perspective will not deliver is a judgment so fine grained that it can distinguish between, say, a 60% chance that it is a lion and an 80% chance. That takes slow, conscious, careful thought.
37%
Flag icon
Both 0% and 100% weigh far more heavily in our minds than the mathematical models of economists say they should.8 Again, this is not surprising if you think about the world in which our brain evolved. There was always at least a tiny chance a lion was lurking in the vicinity.
37%
Flag icon
But our ancestors couldn’t maintain a state of constant alert. The cognitive cost would have been too great. They needed worry-free zones. The solution? Ignore small chances and use the two-setting dial as much as possible. Either it is a lion or it isn’t. Only when something undeniably falls between those two settings—only when we are compelled—do we turn the mental dial to maybe.
38%
Flag icon
outside of a classroom, away from abstractions, when dealing with real issues, these educated, accomplished people reverted to the intuitive. Only when the probabilities were closer to even did they easily grasp that the outcome may or may not happen, Rubin said. “If you say something is 60/40, people kind of get the idea.”
39%
Flag icon
we took advantage of a distinction that philosophers have proposed between “epistemic” and “aleatory” uncertainty. Epistemic uncertainty is something you don’t know but is, at least in theory, knowable. If you wanted to predict the workings of a mystery machine, skilled engineers could, in theory, pry it open and figure it out. Mastering mechanisms is a prototypical clocklike forecasting challenge. Aleatory uncertainty is something you not only don’t know; it is unknowable. No matter how much you want to know whether it will rain in Philadelphia one year from now, no matter how many great ...more
39%
Flag icon
Superforecasters grasp this deep truth better than most. When they sense that a question is loaded with irreducible uncertainty—say, a currency-market question—they have learned to be cautious, keeping their initial estimates inside the shades-of-maybe zone between 35% and 65% and moving out tentatively. They know the “cloudier” the outlook, the harder it is to beat that dart-throwing chimpanzee.
40%
Flag icon
As the legendary investor Charlie Munger sagely observed, “If you don’t get this elementary, but mildly unnatural, mathematics of elementary probability into your repertoire, then you go through a long life like a one-legged man in an ass-kicking contest.”
43%
Flag icon
there is a subtler explanation for Bill Flack’s underreaction to a Japanese official saying Shinzo Abe would visit the Yasukuni Shrine. The political costs of visiting Yasukuni were steep. And Abe had no pressing need to placate his conservative constituents by going, so the benefit was negligible. Conclusion? Not going looked like the rational decision. But Bill ignored Abe’s own feelings. Abe is a conservative nationalist. He had visited Yasukuni before, although not as prime minister. He wanted to go again. Reflecting on his mistake, Bill told me, “I think that the question I was really ...more
64%
Flag icon
One study informed people that each year 2,000 migratory birds drown in oil ponds. How much would you be willing to pay to stop that? Other subjects were told 20,000 birds died each year. A third group was told it was 200,000 birds. And yet in each case, the average amount people said they would be willing to pay was around $80. What people were responding to, Kahneman later wrote, was the prototypical image that came to them—a polluted lake or a drowning, oil-soaked duck. “The prototype automatically evokes an affective response, and the intensity of that emotion is then mapped onto the ...more
64%
Flag icon
Suppose you weight the strength of these arguments, they feel roughly equal, and you settle on a probability of roughly 50%. But notice what’s missing? The time frame. It obviously matters. To use an extreme illustration, the probability of the regime falling in the next twenty-four hours must be less—likely a lot less—than the probability that it will fall in the next twenty-four months. To put this in Kahneman’s terms, the time frame is the “scope” of the forecast.
64%
Flag icon
Mellers ran several studies and found that, exactly as Kahneman expected, the vast majority of forecasters were scope insensitive. Regular forecasters said there was a 40% chance Assad’s regime would fall over three months and a 41% chance it would fall over six months. But the superforecasters did much better: They put the probability of Assad’s fall at 15% over three months and 24% over six months. That’s not perfect scope sensitivity (a tricky thing to define), but it was good enough to surprise Kahneman. If we bear in mind that no one was asked both the three-and six-month version of the ...more
65%
Flag icon
If we take “highly improbable” to mean a 1% or 0.1% or 0.0001% chance of an event, it may take decades or centuries or millennia to pile up enough data. And if these events have to be not only highly improbable but also impactful, the difficulty multiplies. So the first-generation IARPA tournament tells us nothing about how good superforecasters are at spotting gray or black swans. They may be as clueless as anyone else—or astonishingly adept. We don’t know, and shouldn’t fool ourselves that we do. Now if you believe that only black swans matter in the long run, the Good Judgment Project ...more
66%
Flag icon
I’m not sure what 2010 will look like, but I’m sure that it will be very little like we expect, so we should plan accordingly. Lin Wells
67%
Flag icon
Wells hinted at a better way in his closing comment. If you have to plan for a future beyond the forecasting horizon, plan for surprise. That means, as Danzig advises, planning for adaptability and resilience. Imagine a scenario in which reality gives you a smack in the ear and consider how you would respond. Then assume reality will give you a kick in the shin and think about dealing with that. “Plans are useless,” Eisenhower said about preparing for battle, “but planning is indispensable.”11 Taleb has taken this argument further and called for critical systems—like international banking and ...more
67%
Flag icon
Probability judgments should be explicit so we can consider whether they are as accurate as they can be. And if they are nothing but a guess, because that’s the best we can do, we should say so. Knowing what we don’t know is better than thinking we know what we don’t. This comes into focus if we think about forecasting carefully. But we often don’t—leading to absurdities like twenty-year geopolitical forecasts and bestsellers about the coming century. I think both Taleb and Kahneman help explain why we keep making these sorts of mistakes. Kahneman and other pioneers of modern psychology have ...more
67%
Flag icon
Will China become the world’s leading economic power in the mid-twenty-first century? Many are sure it will. And it might. But in the 1980s and early 1990s, there was an even more prevalent belief that Japan would soon dominate the global economy, and its subsequent decline should at least give pause to those asserting China’s ascendancy.12 But it often doesn’t because when looking back it seems strange that anyone ever thought Japan would take the lead. Of course Japan would falter! It’s obvious—in hindsight—just as the prediction that China will not seems obvious today.