Superforecasting: The Art and Science of Prediction
Rate it:
Open Preview
Read between October 19 - November 22, 2018
38%
Flag icon
Of course it’s not always wrong to prefer a confident judgment. All else being equal, our answers to questions like “Does France have more people than Italy?” are likelier to be right when we are confident they are right than when we are not. Confidence and accuracy are positively correlated. But research shows we exaggerate the size of the correlation. For instance, people trust more confident financial advisers over those who are less confident even when their track records are identical. And people equate confidence and competence,
38%
Flag icon
All scientific knowledge is tentative. Nothing is chiseled in granite.
38%
Flag icon
In practice, of course, scientists do use the language of certainty, but only because it is cumbersome whenever you assert a fact to say “although we have a substantial body of evidence to support this conclusion, and we hold it with a high degree of confidence, it remains possible, albeit extremely improbable, that new evidence or arguments may compel us to revise our view of this matter.”
38%
Flag icon
If nothing is certain, it follows that the two- and three-setting mental dials are fatally flawed. Yes and no express certainty. They have to go. The only setting that remains is maybe—the one setting people intuitively want to avoid.
39%
Flag icon
his autobiography: In an Uncertain World.
39%
Flag icon
Epistemic uncertainty is something you don’t know but is, at least in theory, knowable. If you wanted to predict the workings of a mystery machine, skilled engineers could, in theory, pry it open and figure it out. Mastering mechanisms is a prototypical clocklike forecasting challenge. Aleatory uncertainty is something you not only don’t know; it is unknowable. No matter how much you want to know whether it will rain in Philadelphia one year from now, no matter how many great meteorologists you consult, you can’t outguess the seasonal averages.
39%
Flag icon
When they sense that a question is loaded with irreducible uncertainty—say, a currency-market question—they have learned to be cautious, keeping their initial estimates inside the shades-of-maybe zone between 35% and 65% and moving out tentatively. They know the “cloudier” the outlook, the harder it is to beat that dart-throwing chimpanzee.
39%
Flag icon
To careful probabilistic thinkers, 50% is just one in a huge range of settings, so they are no likelier to use it than 49% or 51%. Forecasters who use a three-setting mental dial are much likelier to use 50% when they are asked to make probabilistic judgments because they use it as a stand-in for maybe. Hence, we should expect frequent users of 50% to be less accurate. And that’s exactly what the tournament data show.
39%
Flag icon
I once asked Brian Labatte, a superforecaster from Montreal, what he liked to read. Both fiction and nonfiction, he said. How much of each, I asked? “I would say 70% …”—a long pause—“no, 65/35 nonfiction to fiction.”21 That’s remarkably precise for a casual conversation. Even when making formal forecasts in the IARPA tournament, ordinary forecasters were not usually that precise. Instead, they tended to stick to the tens, meaning they might say something was 30% likely, or 40%, but not 35%, much less 37%.
40%
Flag icon
It’s easy to impress people by stroking your chin and declaring “There is a 73% probability Apple’s stock will finish the year 24% above where it started.” Toss in a few technical terms most people don’t understand—“stochastic” this, “regression” that—and you can use people’s justified respect for math and science to get them nodding along. This is granularity as bafflegab. It is unfortunately common. So how can we know that the granularity we see among superforecasters is meaningful?
40%
Flag icon
be rounded to the nearest five, and then the nearest ten. This way, all of the forecasts were made one level less granular. She then recalculated Brier scores and discovered that superforecasters lost accuracy in response to even the smallest-scale rounding, to the nearest 0.05,
40%
Flag icon
Even very sophisticated people and organizations don’t push themselves to Brian’s level. To take just one example, the National Intelligence Council (NIC)—which produces the National Intelligence Estimates that inform ultrasensitive decisions like whether to invade Iraq or negotiate with Iran—asks its analysts to make judgments on a five- or seven-degree scale.
40%
Flag icon
The religious message that whatever happens, even tragedy, is a meaningful part of a divine plan is ancient, and however one feels about religion there’s no question that this sort of thinking can be consoling and help people endure the otherwise unendurable. Oprah Winfrey, a woman who overcame adversity to achieve stupendous success, personifies and promotes the idea. Using secular language, she said in a commencement address at Harvard University that “there is no such thing as failure. Failure is just life trying to move us in another direction. … Learn from every mistake because every ...more
40%
Flag icon
and a majority of atheists said they believe in fate, defined as the view that “events happen for a reason and that there is an underlying order to life that determines how events turn out.”27 Meaning is a basic human need. As much research shows, the ability to find it is a marker of a healthy, resilient mind. Among survivors of the 9/11 attacks, for example, those who saw meaning in the atrocity were less likely to suffer post-traumatic stress responses.
40%
Flag icon
But as psychologically beneficial as this thinking may be, it sits uneasily with a scientific worldview.
42%
Flag icon
So finding meaning in events is positively correlated with well-being but negatively correlated with foresight. That sets up a depressing possibility: Is misery the price of accuracy? I don’t know. But this book is not about how to be happy. It’s about how to be accurate, and the superforecasters show that probabilistic thinking is essential for that. I’ll leave the existential issues to others.
42%
Flag icon
SUPERFORECASTING ISN’T A paint-by-numbers method but superforecasters often tackle questions in a roughly similar way—one that any of us can follow: Unpack the question into components. Distinguish as sharply as you can between the known and unknown and leave no assumptions unscrutinized. Adopt the outside view and put the problem into a comparative perspective that downplays its uniqueness and treats it as a special case of a wider class of phenomena. Then adopt the inside view that plays up the uniqueness of the problem. Also explore the similarities and differences between your views and ...more
42%
Flag icon
he could raise or lower his forecast as often as he liked, for any reason. So he followed the news closely and updated his forecast whenever he saw good reason to do so. This is obviously important. A forecast that is updated to reflect the latest available information is likely to be closer to the truth than a forecast that isn’t so informed.
42%
Flag icon
Like many superforecasters, Devyn follows developments closely by using Google alerts. If he makes a forecast about Syrian refugees, for example, the first thing he will do is set an alert for “Syrian refugees” and “UNHCR,”
43%
Flag icon
So there are two dangers a forecaster faces after making the initial call. One is not giving enough weight to new information. That’s underreaction. The other danger is overreacting to new information, seeing it as more meaningful than it is, and adjusting a forecast too radically. Both under- and overreaction can diminish accuracy. Both can also, in extreme cases, destroy a perfectly good forecast.
43%
Flag icon
But Bill ignored Abe’s own feelings. Abe is a conservative nationalist. He had visited Yasukuni before, although not as prime minister. He wanted to go again. Reflecting on his mistake, Bill told me, “I think that the question I was really answering wasn’t ‘Will Abe visit Yasukuni?’ but ‘If I were PM of Japan, would I visit Yasukuni?’”3 That’s astute. And it should sound familiar: Bill recognized that he had unconsciously pulled a bait and switch on himself,
44%
Flag icon
This is an extreme case of what psychologists call “belief perseverance.” People can be astonishingly intransigent—and capable of rationalizing like crazy to avoid acknowledging new information that upsets their settled beliefs. Consider the 1942 argument of General John DeWitt, a strong supporter of the internment of Japanese Americans: “The very fact that no sabotage has taken place to date is a disturbing and confirming indication that such action will be taken.”
44%
Flag icon
Fortunately, such extreme obstinacy is rare. More commonly, when we are confronted by facts impossible to ignore, we budge, grudgingly, but the degree of change is likely to be less than it should be.
44%
Flag icon
I found a post by a Wall Street Journal blogger, which said that no one has ever discovered its provenance and the two leading experts on Keynes think it is apocryphal.7 In light of these facts, and in the spirit of what Keynes apparently never said, I concluded that I was wrong. And I have now confessed to the world. Was that hard? Not really. Many smart people made the same mistake, so it’s not embarrassing to own up to it. The quotation wasn’t central to my work and being right about it wasn’t part of my identity.
44%
Flag icon
Our beliefs about ourselves and the world are built on each other in a Jenga-like fashion. My belief that Keynes said “When the facts change, I change my mind” was a block sitting at the apex. It supported nothing else, so I could easily pick it up and toss it without disturbing other blocks. But when Jean-Pierre makes a forecast in his specialty, that block is lower in the structure, sitting next to a block of self-perception, near the tower’s core.
44%
Flag icon
This suggests that superforecasters may have a surprising advantage: they’re not experts or professionals, so they have little ego invested in each forecast. Except in rare circumstances—when Jean-Pierre Beugoms answers military questions, for example—they aren’t deeply committed to their judgments, which makes it easier to admit when a forecast is offtrack and adjust.
45%
Flag icon
But add irrelevant information and we can’t help but see Robert or David more as a person than a stereotype, which weakens the fit.10 Psychologists call this the dilution effect, and given that stereotypes are themselves a source of bias we might say that diluting them is all to the good. Yes and no.
46%
Flag icon
And notice how small Tim’s changes are. There are no dramatic swings of thirty or forty percentage points. The average update was tiny, only 3.5%. That was critical. A few small updates would have put Tim on a heading for underreaction. Many large updates could have tipped him toward overreaction. But with many small updates, Tim slipped safely between Scylla and Charybdis.
46%
Flag icon
The tournament data prove it: superforecasters not only update more often than other forecasters, they update in smaller increments.
47%
Flag icon
The superforecasters are a numerate bunch: many know about Bayes’ theorem and could deploy it if they felt it was worth the trouble. But they rarely crunch the numbers so explicitly.
47%
Flag icon
“I think it is likely that I have a better intuitive grasp of Bayes’ theorem than most people,” he said, “even though if you asked me to write it down from memory I’d probably fail.” Minto is a Bayesian who does not use Bayes’ theorem. That paradoxical description applies to most superforecasters.
47%
Flag icon
But there is no magical formula, just broad principles with lots of caveats. Superforecasters understand the principles but also know that their application requires nuanced judgments. And they would rather break the rules than make a barbarous forecast.
48%
Flag icon
Even when the fixed-minded try, they don’t get as much from the experience as those who believe they can grow. In one experiment, Dweck scanned the brains of volunteers as they answered hard questions, then were told whether their answers were right or wrong and given information that could help them improve. The scans revealed that volunteers with a fixed mindset were fully engaged when they were told whether their answers were right or wrong but that’s all they apparently cared about.
48%
Flag icon
“Considering that Keynes was investing during some of the worst years in history, his returns are astounding,” noted John F. Wasik, the author of a book on Keynes’s investments.5 Keynes was breathtakingly intelligent and energetic, which certainly contributed to his success, but more than that he was an insatiably curious man who loved to collect new ideas—a habit that sometimes required him to change his mind.
48%
Flag icon
In 1920 he was nearly wiped out when his foreign currency forecasts turned out to be horribly wrong. He found his footing again and made a fortune for himself and others in the 1920s. But just like Mary Simpson in 2008, Keynes didn’t see the disaster of 1929 coming and he again lost big. But he bounced back and did even better than before. For Keynes, failure was an opportunity to learn—to identify mistakes, spot new alternatives, and try again.
49%
Flag icon
The knowledge required to ride a bicycle can’t be fully captured in words and conveyed to others. We need “tacit knowledge,” the sort we only get from bruising experience. To learn to ride a bicycle, we must try to ride one. It goes badly at first. You fall to one side, you fall to the other.
49%
Flag icon
That is blindingly obvious. It should be equally obvious that learning to forecast requires trying to forecast. Reading books on forecasting is no substitute for the experience of the real thing.
49%
Flag icon
Don Moore points out that police officers spend a lot of time figuring out who is telling the truth and who is lying, but research has found they aren’t nearly as good at it as they think they are and they tend not to get better with experience. That’s because experience isn’t enough. It must be accompanied by clear feedback.
49%
Flag icon
That is essential. To learn from failure, we must know when we fail. The baby who flops backward does. So does the boy who skins his knee when he falls off the bike. And the accountant who puts an easy putt into a sand trap. And because they know, they can think about what went wrong, adjust, and try again. Unfortunately, most forecasters do not get the high-quality feedback that helps meteorologists and bridge players improve. There are two main reasons why.
49%
Flag icon
Ambiguous language is a big one. As we saw in chapter 3, vague terms like “probably” and “likely” make it impossible to judge forecasts.
49%
Flag icon
Consider the Forer effect, named for the psychologist Bertram Forer, who asked some students to complete a personality test, then gave them individual personality profiles based on the results and asked how well the test captured their individual personalities. People were impressed by the test, giving it an average rating of 4.2 out of 5—which was remarkable because Forer had actually taken vague statements like “you have a great need for people to like and admire you” from a book on astrology, assembled them into a profile, and given the same profile to everyone.11 Vague language is elastic ...more
50%
Flag icon
The second big barrier to feedback is time lag. When forecasts span months or years, the wait for a result allows the flaws of memory to creep in. You know how you feel now about the future.
50%
Flag icon
If you are old enough now to have been a sentient being in 1991, answer this question: Back then, how likely did you think it was that the incumbent president, George H. W. Bush (now known as Bush 41) would win reelection in 1992? We all know Bush 41 lost to Bill Clinton, but you may recall that he was popular after the victory in the Gulf War. So perhaps you thought his chances were pretty good, but, obviously, he also stood a pretty good chance of losing. Maybe it was fifty-fifty? Or maybe you thought the war gave him the edge, with, say, a 60% or 70% chance of winning? In fact, your memory ...more
50%
Flag icon
It gets progressively more absurd. In this debate, each candidate heaps praise on his opponents while savaging himself—because Bush 41 was certain to crush whomever he faced. Everyone knew that. It’s why leading Democrats didn’t contest the nomination that year, clearing the way for the obscure governor of Arkansas, Bill Clinton.
50%
Flag icon
Once we know the outcome of something, that knowledge skews our perception of what we thought before we knew the outcome: that’s hindsight bias.
50%
Flag icon
Baruch Fischhoff was the first to document the phenomenon in a set of elegant experiments. One had people estimate the likelihood of major world events at the time of Fischhoff’s research—Will Nixon personally meet with Mao?—then recall their estimate after the event did or did not happen. Knowing the outcome consistently slanted the estimate, even when people t...
This highlight has been truncated due to consecutive passage length restrictions.
50%
Flag icon
A loud “clunk!” means the throw hit the rim but did the ball roll in or out? They can’t be sure. Of course they may convince themselves they know how they are doing, but they don’t really, and if they throw balls for weeks they may become more confident—I’ve practiced so much I must be excellent!—but they won’t get better at taking free throws. Only if the lights are turned on can they get clear feedback. Only then can they learn and get better.
50%
Flag icon
By the way, there are no shortcuts. Bridge players may develop well-calibrated judgment when it comes to bidding on tricks, but research shows that judgment calibrated in one context transfers poorly, if at all, to another. So if you were thinking of becoming a better political or business forecaster by playing bridge, forget it. To get better at a certain type of forecasting, that is the type of forecasting you must do—over and over again,
51%
Flag icon
In the first two seasons of the tournament, Jean-Pierre would often look at his old forecasts and be frustrated that his comments were so sparse that “I often could not figure out why I made a certain forecast” and thus couldn’t reconstruct his thought process.12 So he started leaving more and longer comments knowing it would help him critically examine his thinking.
51%
Flag icon
People often assume that when a decision is followed by a good outcome, the decision was good, which isn’t always true, and can be dangerous if it blinds us to the flaws in our thinking.