Superforecasting: The Art and Science of Prediction
Rate it:
Open Preview
34%
Flag icon
Active open-mindedness (AOM) is a concept coined by the psychologist Jonathan Baron, who has an office next to mine at the University of Pennsylvania. Baron’s test for AOM asks whether you agree or disagree with statements like:
35%
Flag icon
For superforecasters, beliefs are hypotheses to be tested, not treasures to be guarded.
35%
Flag icon
Levine is also a superforecaster. And while he is an extreme case, he underscores a central feature of superforecasters: they have a way with numbers. Most aced a brief test of basic numeracy that asked questions like, “The chance of getting a viral infection is 0.05%. Out of 10,000 people, about how many of them are expected to get infected?” (The answer is 5.) And numeracy is just as evident on their résumés. Many have backgrounds in math, science, or computer programming.
35%
Flag icon
I have yet to find a superforecaster who isn’t comfortable with numbers and most are more than capable of putting them to practical use. And they do, occasionally.
35%
Flag icon
There is no moat and no castle wall. While superforecasters do occasionally deploy their own explicit math models, or consult other people’s, that’s rare. The great majority of their forecasts are simply the product of careful thought and nuanced judgment. “I can think of a couple of questions where a little bit of math was useful,” Lionel Levine recalled about his own forecasting, but otherwise he relies on subjective judgment. “It’s all, you know, balancing, finding relevant information and deciding how relevant is this really? How much should it really affect my forecast?
35%
Flag icon
But the fact that superforecasters are almost uniformly highly numerate people is not mere coincidence. Superior numeracy does help superforecasters, but not because it lets them tap into arcane math models that divine the future. The truth is simpler, subtler, and much more interesting.
37%
Flag icon
Bowden reported that Obama told him in a later interview, “In this situation, what you started to get was probabilities that disguised uncertainty as opposed to actually providing you with useful information.” Bowden then wrote that “Obama had no trouble admitting it to himself. If he acted on this, he was going to be taking a gamble, pure and simple. A big gamble.”
37%
Flag icon
It was remarkably late in history—arguably as late as the 1713 publication of Jakob Bernoulli’s Ars Conjectandi—before the best minds started to think seriously about probability.
37%
Flag icon
Both 0% and 100% weigh far more heavily in our minds than the mathematical models of economists say they should.
38%
Flag icon
To grasp the meaning of “a 70% chance of rain tomorrow” we have to understand that rain may or may not happen, and that over 100 days on which we forecast chances of rain, if our forecasts are good, it should rain on 70% of them and be dry on the rest. Nothing could be further removed from our natural inclination to think “It will rain” or “It won’t rain”—or, if you insist, “Maybe it will rain.”
39%
Flag icon
Robert Rubin is a probabilistic thinker. As a student at Harvard, he heard a lecture in which a philosophy professor argued there is no provable certainty and “it just clicked with everything I’d sort of thought,” he told me. It became the axiom that guided his thinking through twenty-six years at Goldman Sachs, as an adviser to President Bill Clinton, and as secretary of the Treasury. It’s in the title of his autobiography: In an Uncertain World. By rejecting certainty, everything became a matter of probability for Rubin, and he wanted as much precision as possible. “One of the first times I ...more
39%
Flag icon
To do that, we took advantage of a distinction that philosophers have proposed between “epistemic” and “aleatory” uncertainty. Epistemic uncertainty is something you don’t know but is, at least in theory, knowable. If you wanted to predict the workings of a mystery machine, skilled engineers could, in theory, pry it open and figure it out. Mastering mechanisms is a prototypical clocklike forecasting challenge. Aleatory uncertainty is something you not only don’t know; it is unknowable.
39%
Flag icon
Another nugget of evidence comes from the phrase “fifty-fifty.” To careful probabilistic thinkers, 50% is just one in a huge range of settings, so they are no likelier to use it than 49% or 51%. Forecasters who use a three-setting mental dial are much likelier to use 50% when they are asked to make probabilistic judgments because they use it as a stand-in for maybe. Hence, we should expect frequent users of 50% to be less accurate. And that’s exactly what the tournament data show.
40%
Flag icon
The answer lies in the tournament data. Barbara Mellers has shown that granularity predicts accuracy: the average forecaster who sticks with the tens—20%, 30%, 40%—is less accurate than the finer-grained forecaster who uses fives—20%, 25%, 30%—and still less accurate than the even finer-grained forecaster who uses ones—20%, 21%, 22%. As a further test, she rounded forecasts to make them less granular, so a forecast at the greatest granularity possible in the tournament, single percentage points, would be rounded to the nearest five, and then the nearest ten. This way, all of the forecasts were ...more
40%
Flag icon
Vonnegut drums this theme relentlessly. “Why me?” moans Billy Pilgrim when he is abducted by aliens. “That is a very Earthling question to ask, Mr. Pilgrim,” the aliens respond. “Why you? Why us for that matter? Why anything?”25 Only the naive ask “Why?” Those who see reality more clearly don’t bother.
42%
Flag icon
For both the superforecasters and the regulars, we also compared individual fate scores with Brier scores and found a significant correlation—meaning the more a forecaster inclined toward it-was-meant-to-happen thinking, the less accurate her forecasts were. Or, put more positively, the more a forecaster embraced probabilistic thinking, the more accurate she was.
42%
Flag icon
SUPERFORECASTING ISN’T A paint-by-numbers method but superforecasters often tackle questions in a roughly similar way—one that any of us can follow: Unpack the question into components. Distinguish as sharply as you can between the known and unknown and leave no assumptions unscrutinized.
42%
Flag icon
In the third season alone, Devyn made 2,271 forecasts on 140 questions. That’s more than 16 forecasts, on average, for each question. “I would attribute my GJP success so far, such as it has been,” Devyn wrote, “to good luck and frequent updating.”
42%
Flag icon
Superforecasters do monitor the news carefully and factor it into their forecasts, which is bound to give them a big advantage over the less attentive. If that’s the decisive factor, then superforecasters’ success would tell us nothing more than “it helps to pay attention and keep your forecast up to date”—which is about as enlightening as being told that when polls show a candidate surging into a comfortable lead he is more likely to win. But that’s not the whole story. For one thing, superforecasters’ initial forecasts were at least 50% more accurate than those of regular forecasters. Even ...more
43%
Flag icon
Both under- and overreaction can diminish accuracy. Both can also, in extreme cases, destroy a perfectly good forecast.
45%
Flag icon
Psychologists call this the dilution effect, and given that stereotypes are themselves a source of bias we might say that diluting them is all to the good. Yes and no. Yes, it is possible to fight fire with fire, and bias with bias, but the dilution effect remains a bias. Remember what’s going on here. People base their estimate on what they think is a useful tidbit of information. Then they encounter clearly irrelevant information—meaningless noise—which they indisputably should ignore. But they don’t. They sway in the wind, at the mercy of the next random gust of irrelevant information.
46%
Flag icon
The average update was tiny, only 3.5%. That was critical.
46%
Flag icon
Why this works is no mystery. A forecaster who doesn’t adjust her views in light of new information won’t capture the value of that information, while a forecaster who is so impressed by the new information that he bases his forecast entirely on it will lose the value of the old information that underpinned his prior forecast. But the forecaster who carefully balances old and new captures the value in both—and puts it into her new forecast. The best way to do that is by updating often but bit by bit.
46%
Flag icon
In simple terms, the theorem says that your new belief should depend on two things—your prior belief (and all the knowledge that informed it) multiplied by the “diagnostic value” of the new information. That’s head-scratchingly abstract, so let’s watch Jay Ulfelder—political scientist, superforecaster, and colleague of mine—put it to concrete use.
47%
Flag icon
The superforecasters are a numerate bunch: many know about Bayes’ theorem and could deploy it if they felt it was worth the trouble. But they rarely crunch the numbers so explicitly. What matters far more to the superforecasters than Bayes’ theorem is Bayes’ core insight of gradually getting closer to the truth by constantly updating in proportion to the weight of the evidence.
47%
Flag icon
But there is no magical formula, just broad principles with lots of caveats. Superforecasters understand the principles but also know that their application requires nuanced judgments. And they would rather break the rules than make a barbarous forecast.
47%
Flag icon
The psychologist Carol Dweck would say Simpson has a “growth mindset,” which Dweck defines as believing that your abilities are largely the product of effort—that you can “grow” to the extent that you are willing to work hard and learn.2 Some people might think that’s so obviously true it scarcely needs to be said. But as Dweck’s research has shown, the growth mindset is far from universal. Many people have what she calls a “fixed mindset”—the belief that we are who we are, and abilities can only be revealed, not created and developed.
47%
Flag icon
People with the fixed mindset say things like “I’m bad at math” and see that as an immutable feature of who they are, like being left-handed or female or tall. This has serious consequences.
48%
Flag icon
To be a top-flight forecaster, a growth mindset is essential. The best illustration is the man who is reputed to have said—but didn’t—“When the facts change, I change my mind.
48%
Flag icon
When he was blindsided by the crash of 1929, he subjected his thinking to withering scrutiny. Keynes concluded that there was something wrong with one of his key theoretical assumptions. Stock prices do not always reflect the true value of companies, so an investor should study a company thoroughly and really understand its business, capital, and management when deciding whether it had sufficient underlying value to make an investment for the long term worthwhile. In the United States, about the same time, this approach was developed by Benjamin Graham, who called it “value investing.” It ...more
49%
Flag icon
We learn new skills by doing. We improve those skills by doing more. These fundamental facts are true of even the most demanding skills. Modern fighter jets are enormously complex flying computers but classroom instruction isn’t enough to produce a qualified pilot. Not even time in advanced flight simulators will do. Pilots need hours in the air, the more the better. The same is true of surgeons, bankers, and business executives.
49%
Flag icon
The knowledge required to ride a bicycle can’t be fully captured in words and conveyed to others. We need “tacit knowledge,” the sort we only get from bruising experience. To learn to ride a bicycle, we must try to ride one. It goes badly at first. You fall to one side, you fall to the other. But keep at it and with practice it becomes effortless—although if you had to explain how to stay upright, so they can skip the ordeal you just went through, you would succeed no better than Polanyi.
49%
Flag icon
That is blindingly obvious. It should be equally obvious that learning to forecast requires trying to forecast. Reading books on forecasting is no substitute for the experience of the real thing.9
49%
Flag icon
But not all practice improves skill. It needs to be informed practice. You need to know which mistakes to look out for—and which best practices really are best. So don’t burn your books. As noted earlier, randomized controlled experiments have shown that mastering the contents of just one tiny booklet, our training guidelines (see the appendix), can improve your accuracy by roughly 10%.
49%
Flag icon
Gaps like that are far from unusual. Research on calibration—how closely your confidence matches your accuracy—routinely finds people are too confident.
50%
Flag icon
In this debate, each candidate heaps praise on his opponents while savaging himself—because Bush 41 was certain to crush whomever he faced. Everyone knew that. It’s why leading Democrats didn’t contest the nomination that year, clearing the way for the obscure governor of Arkansas, Bill Clinton.
50%
Flag icon
In 1991 the world watched in shock as the Soviet Union disintegrated. So in 1992–93 I returned to the experts, reminded them of the question in 1988, and asked them to recall their estimates. On average, the experts recalled a number 31 percentage points higher than the correct figure.
52%
Flag icon
She forecast all 150 questions in year 3,
52%
Flag icon
In philosophic outlook, they tend to be:
53%
Flag icon
In his 1972 classic, Victims of Groupthink, the psychologist Irving Janis—one of my PhD advisers at Yale long ago—explored the decision making that went into both the Bay of Pigs invasion and the Cuban missile crisis. Today, everyone has heard of groupthink, although few have read the book that coined the term or know that Janis meant something more precise than the vague catchphrase groupthink has become today. In Janis’s hypothesis, “members of any small cohesive group tend to maintain esprit de corps by unconsciously developing a number of shared illusions and related norms that interfere ...more
54%
Flag icon
As mentioned earlier, the term “wisdom of crowds” comes from James Surowiecki’s 2004 bestseller of the same name, but Surowiecki’s title was itself a play on the title of a classic 1841 book, Extraordinary Popular Delusions and the Madness of Crowds, which chronicled a litany of collective folly.
54%
Flag icon
We also gave teams a primer on teamwork based on insights gleaned from research in group dynamics. On the one hand, we warned, groupthink is a danger. Be cooperative but not deferential. Consensus is not always good; disagreement not always bad. If you do happen to agree, don’t take that agreement—in itself—as proof that you are right. Never stop doubting. Pointed questions are as essential to a team as vitamins are to a human body.
55%
Flag icon
“So the guerrillas have been attacked. The plan has fallen apart. They don’t have helicopters or tanks. But they have to cross eighty miles of swamp and jungle before they can begin to look for shelter in the mountains? Is that correct?” I suspect that this conversation would not have concluded “sounds good!”
56%
Flag icon
That sense of belonging developed in Elaine Rich. She did well, boosting her confidence, and her sense of responsibility grew with it. “I felt that I had to be really careful that I was sharing, shouldering my part of the burden, rather than being a freeloader by reading what other people wrote,” and not offering thoughts and research, “which is always a temptation.”
56%
Flag icon
The results speak for themselves. On average, when a forecaster did well enough in year 1 to become a superforecaster, and was put on a superforecaster team in year 2, that person became 50% more accurate. An analysis in year 3 got the same result. Given that these were collections of strangers tenuously connected in cyberspace, we found that result startling.
56%
Flag icon
None of this means markets are perfect, or such efficient aggregators of information that no mortal should ever be so foolish as to aspire to beat them. That’s the strong version of what economists call the efficient market hypothesis (EMH), and it’s hard to square with what we have learned from psychology and experience.
56%
Flag icon
The results were clear-cut each year. Teams of ordinary forecasters beat the wisdom of the crowd by about 10%. Prediction markets beat ordinary teams by about 20%. And superteams beat prediction markets by 15% to 30%.
57%
Flag icon
One involves the provocative phrase “diversity trumps ability,” coined by my colleague (and former competitor in the IARPA tournament) Scott Page.
58%
Flag icon
Fortunately, the contradiction between being a superforecaster and a superleader is more apparent than real. In fact, the superforecaster model can help make good leaders superb and the organizations they lead smart, adaptable, and effective. The key is an approach to leadership and organization first articulated by a nineteenth-century Prussian general, perfected by the German army of World War II, made foundational doctrine by the modern American military, and deployed by many successful corporations today. You might even find it at your neighborhood Walmart.
60%
Flag icon
Eisenhower also expected his officers to engage in open debate. He respected well-founded criticisms and readily acknowledged mistakes. In 1954, when Eisenhower was president, the army chief of staff, Matthew Ridgway, advised against intervening in Vietnam, saying it would take a massive effort of more than half a million soldiers. Eisenhower respected Ridgway’s judgment because in 1943 Ridgeway had resisted Eisenhower’s order to drop an airborne division on Rome and Eisenhower later decided Ridgeway had been right.