Superforecasting: The Art and Science of Prediction
Rate it:
Open Preview
18%
Flag icon
Let’s suppose we discover that you have a Brier score of 0.2.
18%
Flag icon
Nonetheless, I managed to recruit 284 serious professionals, card-carrying experts whose livelihoods involved analyzing political and economic trends and events. Some were academics working in universities or think tanks. Others worked for branches of the US government, or for international organizations such as the World Bank and the International Monetary Fund, or for the media. A small number were quite famous, others well known in their professional communities, some early in their careers and at that point quite obscure. But I still had to guarantee anonymity because even experts who were ...more
18%
Flag icon
The first questions put to the experts were about themselves. Age? (The average was forty-three.) Relevant work experience? (The average was 12.2 years.) Education? (Almost all had postgraduate training; half had PhDs.) We also asked about their ideological leanings and preferred approaches to solving political problems.
19%
Flag icon
If you didn’t know the punch line of EPJ before you read this book, you do now: the average expert was roughly as accurate as a dart-throwing chimpanzee. But as students are warned in introductory statistics classes, averages can obscure. Hence the old joke about statisticians sleeping with their feet in an oven and their head in a freezer because the average temperature is comfortable.
19%
Flag icon
In the EPJ results, there were two statistically distinguishable groups of experts. The first failed to do better than random guessing, and in their longer-range forecasts even managed to lose to the chimp. The second group beat the chimp, though not by a wide margin, and they still had plenty of reason to be humble. Indeed, they only barely beat simple algorithms like “always predict no change” or “predict the recent rate of change.” Still, however modest their foresight was, they had some.
19%
Flag icon
The other group consisted of more pragmatic experts who drew on many analytical tools, with the choice of tool hinging on the particular problem they faced.
19%
Flag icon
Decades ago, the philosopher Isaiah Berlin wrote a much-acclaimed but rarely read essay that compared the styles of thinking of great authors through the ages. To organize his observations, he drew on a scrap of 2,500-year-old Greek poetry attributed to the warrior-poet Archilochus: “The fox knows many things but the hedgehog knows one big thing.” No one will ever know whether Archilochus was on the side of the fox or the hedgehog but Berlin favored foxes. I felt no need to take sides. I just liked the metaphor because it captured something deep in my data. I dubbed the Big Idea experts ...more
19%
Flag icon
Foxes beat hedgehogs. And the foxes didn’t just win by acting like chickens, playing it safe with 60% and 70% forecasts where hedgehogs boldly went with 90% and 100%. Foxes beat hedgehogs on both calibration and resolution. Foxes had real foresight. Hedgehogs didn
20%
Flag icon
The key is recognizing that useful information is often dispersed widely, with one person possessing a scrap, another holding a more important piece, a third having a few bits, and so on.
21%
Flag icon
A fox with the bulging eyes of a dragonfly is an ugly mixed metaphor but it captures a key reason why the foresight of foxes is superior to that of hedgehogs with their green-tinted glasses. Foxes aggregate perspectives.
21%
Flag icon
Unfortunately, aggregation doesn’t come to us naturally. The tip-of-your-nose perspective insists that it sees reality objectively and correctly, so there is no need to consult other perspectives. All too often we agree. We don’t consider alternative views—even when it’s clear that we should.
22%
Flag icon
Forget the dart-throwing-chimp punch line. What matters is that EPJ found modest but real foresight, and the critical ingredient was the style of thinking. The next step was figuring out how to advance that insight.
22%
Flag icon
The exact numbers are classified, but by one rough estimate the IC has a budget of more than $50 billion and employs one hundred thousand people. Of these, twenty thousand are intelligence analysts, whose job is not to collect information but to make sense of what is collected and judge its implications for national security.
23%
Flag icon
What went wrong? One explanation was that the IC had caved to White House bullying. The intelligence had been politicized. But official investigations rejected that claim.
23%
Flag icon
This particular bait and switch—replacing “Was it a good decision?” with “Did it have a good outcome?”—is both popular and pernicious.
24%
Flag icon
In 2006 the Intelligence Advanced Research Projects Activity (IARPA) was created. Its mission is to fund cutting-edge research with the potential to make the intelligence community smarter and more effective.
24%
Flag icon
It’s a simple idea but, as was true in medicine for so long, it is routinely overlooked. For example, the CIA gives its analysts a manual written by Richards Heuer, a former analyst, that lays out relevant insights from psychology, including biases that can trip up an analyst’s thinking. It’s fine work. And it makes sense that giving analysts a basic grasp of psychology will help them avoid cognitive traps and thus help produce better judgments. But does it? No one knows. It has never been tested. Some analysts think the training is so intuitively compelling that it doesn’t need to be tested. ...more
24%
Flag icon
In the summer of 2010, two IARPA officials, Jason Matheny and Steve Rieber, visited Berkeley. Barbara Mellers and I met them at a hotel with a tourist-trap view of San Francisco, where they gave us news as pleasant as the vista. They planned to act on the key recommendation in the National Research Council report—a recommendation I had confidently predicted would gather dust. IARPA would sponsor a massive tournament to see who could invent the best methods of making the sorts of forecasts that intelligence analysts make every day.
24%
Flag icon
IARPA was looking for questions in the Goldilocks zone of difficulty, neither so easy that any attentive reader of the New York Times could get them right nor so hard that no one on the planet could. IARPA saw this Goldilocks zone as the best place both for finding new forecasting talent and for testing new methods of cultivating talent.
24%
Flag icon
And as EPJ and other studies have shown, human cognitive systems will never be able to forecast turning points in the lives of individuals or nations several years into the future—and heroic searches for superforecasters won’t change that.
25%
Flag icon
In the first year, several thousand volunteers signed up and roughly 3,200 passed through our initial gauntlet of psychometric tests and started forecasting. We called our research team and program the Good Judgment Project.
25%
Flag icon
Here’s one possible revelation: Imagine you get a couple of hundred ordinary people to forecast geopolitical events. You see how often they revise their forecasts and how accurate those forecasts prove to be and use that information to identify the forty or so who are the best. Then you have everyone make lots more forecasts. This time, you calculate the average forecast of the whole group—“the wisdom of the crowd”—but with extra weight given to those forty top forecasters. Then you give the forecast a final tweak: You “extremize” it, meaning you push it closer to 100% or zero. If the forecast ...more
25%
Flag icon
Now imagine that the forecasts you produce this way beat those of every other group and method available, often by large margins. Your forecasts even beat those of professional intelligence analysts inside the government who have access to classified information—by margins that remain classified.
25%
Flag icon
Thanks to IARPA, we now know a few hundred ordinary people and some simple math can not only compete with professionals supported by a multibillion-dollar apparatus but also beat them.
25%
Flag icon
With his gray beard, thinning hair, and glasses, Doug Lorch doesn’t look like a threat to anyone. He looks like a computer programmer, which he was, for IBM. He is retired now. He lives in a quiet neighborhood in Santa Barbara with his wife, an artist who paints lovely watercolors. His Facebook avatar is a duck. Doug likes to drive his little red convertible Miata around the sunny streets, enjoying the California breeze, but that can only occupy so many hours in the day. Doug has no special expertise in international affairs, but he has a healthy curiosity about what’s happening. He reads the ...more
25%
Flag icon
In the first year, Doug answered 104 questions like “Will Serbia be officially granted European Union candidacy by 31 December 2011?” and “Will the London Gold Market Fixing price of gold (USD per ounce) exceed $1,850 on 30 September 2011?” That’s a lot of forecasting, but it understates what Doug did.
25%
Flag icon
Over four years, nearly five hundred questions about international affairs were asked of thousands of GJP’s forecasters, generating well over one million judgments about the future. But even at the individual level, the numbers quickly added up. In year 1 alone, Doug Lorch made roughly one thousand separate forecasts.
26%
Flag icon
overall Brier score was 0.22, putting him in fifth spot among the 2,800 competitors in the Good Judgment Project.
26%
Flag icon
In year 2, Doug joined a superforecaster team and did even better, with a final Brier score of 0.14, making him the best forecaster of the 2,800 GJP volunteers.
26%
Flag icon
He also beat by 40% a prediction market in which traders bought and sold futures contracts on the outcomes of the same questions.
26%
Flag icon
He was the only person to beat the extremi...
This highlight has been truncated due to consecutive passage length restrictions.
26%
Flag icon
Of course if Doug Lorch were a uniquely gifted oracle, he would pose little threat to the status quo. There is only so much forecasting one man can do. But Doug isn’t unique. We have already met Bill Flack, the retired Department of Agriculture employee from Nebraska. There were 58 others among the 2,800 volunteers who scored at the top of the charts in year 1. They were our first class of superforecasters. At the end of year 1, their collective Brier score was 0.25, compared with 0.37 for all the other forecasters—and that gap grew in later years so that by the end of the four-year ...more
26%
Flag icon
Of course it would be wonderful to have a direct comparison between superforecasters and intelligence analysts, but such a thing would be closely guarded. However, in November 2013, the Washington Post editor David Ignatius reported that “a participant in the project” had told him that the superforecasters “performed about 30 percent better than the average for intelligence community analysts who could read intercepts and other secret data.
27%
Flag icon
That complexity makes it hard to figure out what to chalk up to skill and what to luck—a subject probed in depth by Michael Mauboussin, a global financial strategist, in his book The Success Equation. But as Mauboussin noted, there is an elegant rule of thumb that applies to athletes and CEOs, stock analysts and superforecasters. It involves “regression to the mean.
28%
Flag icon
So regression to the mean is an indispensable tool for testing the role of luck in performance: Mauboussin notes that slow regression is more often seen in activities dominated by skill, while faster regression is more associated with chance.
28%
Flag icon
So how did superforecasters hold up across years? That’s the key question. And the answer is phenomenally well. For instance, in years 2 and 3 we saw the opposite of regression to the mean: the superforecasters as a whole, including Doug Lorch, actually increased their lead over all other forecasters.
28%
Flag icon
The correlation between how well individuals do from one year to the next is about 0.65, modestly higher than that between the heights of fathers and sons.
28%
Flag icon
All of this suggests two key conclusions. One, we should not treat the superstars of any given year as infallible, not even Doug Lorch. Luck plays a role and it is only to be expected that the superstars will occasionally have a bad year and produce ordinary results—just as superstar athletes occasionally look less than stellar. But more basically, and more hopefully, we can conclude that the superforecasters were not just lucky. Mostly, their results reflected skill. Which raises the big question: Why are superforecasters so good?
29%
Flag icon
Before we get to the results, bear in mind that several thousand people volunteered for the GJP in the first year and the 2,800 who were motivated enough to work through all the testing and make forecasts were far from a randomly selected sample.
30%
Flag icon
What did we find? Regular forecasters scored higher on intelligence and knowledge tests than about 70% of the population. Superforecasters did better, placing higher than about 80% of the population.
30%
Flag icon
Note three things. First, the big jumps in intelligence and knowledge are from the public to the forecasters, not from forecasters to superforecasters. Second, although superforecasters are well above average, they did not score off-the-charts high and most fall well short of so-called genius territory, a problematic concept often arbitrarily defined as the top 1%, or an IQ of 135 and up.
30%
Flag icon
But having the requisite intelligence and knowledge is not enough. Many clever and informed forecasters in the tournament fell far short of superforecaster accuracy. And history is replete with brilliant people who made forecasts that proved considerably less than prescient.
30%
Flag icon
The Italian American physicist Enrico Fermi—a central figure in the invention of the atomic bomb—concocted this little brainteaser decades before the invention of the Internet. And Fermi’s students did not have the Chicago yellow pages at hand. They had nothing. And yet Fermi expected them to come up with a reasonably accurate estimate.
31%
Flag icon
I shared Levitin’s discussion of Fermi estimation with a group of superforecasters and it drew a chorus of approval. Sandy Sillman told me Fermi estimation was so critical to his job as a scientist working with atmospheric models that it became “a part of my natural way of thinking.” That’s a huge advantage for a forecaster, as we shall see.
32%
Flag icon
Thinking like Fermi, Bill unpacked the question by asking himself “What would it take for the answer to be yes? What would it take for it to be no?
32%
Flag icon
But superforecasters wouldn’t bother with any of that, at least not at first. The first thing they would do is find out what percentage of American households own a pet. Statisticians call that the base rate—how common something is within a broader class.
33%
Flag icon
And it’s astonishingly easy to settle on a bad anchor. In classic experiments, Daniel Kahneman and Amos Tversky showed you could influence people’s judgment merely by exposing them to a number—any number, even one that is obviously meaningless, like one randomly selected by the spin of a wheel.
34%
Flag icon
Researchers have found that merely asking people to assume their initial judgment is wrong, to seriously consider why that might be, and then make another judgment, produces a second estimate which, when combined with the first, improves accuracy almost as much as getting a second estimate from another person.
34%
Flag icon
The commentary that superforecasters post on GJP forums is rife with “on the one hand/on the other” dialectical banter.
34%
Flag icon
Yet these are ordinary people. Forecasting is their hobby. Their only reward is a gift certificate and bragging rights on Facebook. Why do they put so much into it? One answer is it’s fun. “Need for cognition” is the psychological term for the tendency to engage in and enjoy hard mental slogs. People high in need for cognition are the sort who like crosswords and Sudoku puzzles, the harder, the better—and superforecasters score high in need-for-cognition tests.