Superforecasting: The Art and Science of Prediction
Rate it:
Open Preview
Read between October 19 - November 22, 2018
5%
Flag icon
“I have been struck by how important measurement is to improving the human condition,” Bill Gates wrote. “You can achieve incredible progress if you set a clear goal and find a measure that will drive progress toward that goal. … This may seem basic, but it is amazing how often it is not done and how hard it is to get right.”8
6%
Flag icon
Only the determined can deliver it reasonably consistently, which is why our analyses have consistently found commitment to self-improvement to be the strongest predictor of performance.
7%
Flag icon
Ferrucci sees light at the end of this long dark tunnel: “I think it’s going to get stranger and stranger” for people to listen to the advice of experts whose views are informed only by their subjective judgment. Human thought is beset by psychological pitfalls, a fact that has only become widely recognized in the last decade or two. “So what I want is that human expert paired with a computer to overcome the human cognitive limitations and biases.”14 If Ferrucci is right—I suspect he is—we will need to blend computer-based forecasting and subjective judgment in the future. So it’s time we got ...more
8%
Flag icon
And treatments seldom got better, no matter how much time passed. When George Washington fell ill in 1799, his esteemed physicians bled him relentlessly, dosed him with mercury to cause diarrhea, induced vomiting, and raised blood-filled blisters by applying hot cups to the old man’s skin.
8%
Flag icon
Why should he? Experiments are what people do when they aren’t sure what the truth is. And Galen was untroubled by doubt. Each outcome confirmed he was right, no matter how equivocal the evidence might look to someone less wise than the master. “All who drink of this treatment recover in a short time, except those whom it does not help, who all die,” he wrote. “It is obvious, therefore, that it fails only in incurable cases.”5 Galen is an extreme example but he is the sort of figure who pops up repeatedly in the history of medicine. They are men (always men) of strong conviction and a profound ...more
8%
Flag icon
If patients who were identical in every way were put into two groups, and the groups were treated differently, he wrote, we would know the treatment caused any difference in outcome. It seems simple but is impossible in practice because no two people are exactly alike, not even identical twins, so the experiment will be confounded by the differences among test subjects. The solution lay in statistics: Randomly assigning people to one group or the other would mean whatever differences there are among them should balance out if enough people participated in the experiment. Then we can ...more
9%
Flag icon
What medicine lacked was doubt. “Doubt is not a fearful thing,” Feynman observed, “but a thing of very great value.”10 It’s what propels science forward.
9%
Flag icon
It was the absence of doubt—and scientific rigor—that made medicine unscientific and caused it to stagnate for so long.
9%
Flag icon
Partway through the trial, Cochrane met with a group of the cardiologists who had tried to stop his experiment. He told them that he had preliminary results. The difference in outcomes between the two treatments was not statistically significant, he emphasized, but it appeared that patients might do slightly better in the cardiac care units. “They were vociferous in their abuse: ‘Archie,’ they said, ‘we always thought you were unethical. You must stop the trial at once.’” But then Cochrane revealed he had played a little trick. He had reversed the results: home care had done slightly better ...more
9%
Flag icon
Cochrane cited the Thatcher government’s “short, sharp, shock” approach to young offenders, which called for brief incarceration in spartan jails governed by strict rules. Did it work? The government had simply implemented it throughout the justice system, making it impossible to answer. If the policy was introduced and crime went down, that might mean the policy worked, or perhaps crime went down for any of a hundred other possible reasons. If crime went up, that might show the policy was useless or even harmful, or it might mean crime would have risen even more but for the beneficial effects ...more
9%
Flag icon
The politicians would be blind men arguing over the colors of the rainbow.
10%
Flag icon
The human brain demands order. The world must make sense,
10%
Flag icon
In celebrated research, Michael Gazzaniga designed a bizarre situation in which sane people did indeed have no idea why they were doing what they were doing. His test subjects were “split-brain” patients, meaning that the left and right hemispheres of their brains could not communicate with each other because the connection between them, the corpus callosum, had been surgically severed (traditionally as a treatment for severe epilepsy). These people are remarkably normal, but their condition allows researchers to communicate directly with only one hemisphere of their brain—by showing an image ...more
11%
Flag icon
As it turned out, there was only one perpetrator. His name is Anders Breivik. He isn’t Muslim. He hates Muslims. Breivik’s attacks were aimed at a government that he felt had betrayed Norway with its multiculturalist policies. After Breivik’s arrest, many people accused those who had rushed to judgment of Islamophobia, and not without reason, as some of them had seemed all too eager to blame Muslims in general. But given the few facts known at the time, and the history of mass-atrocity terrorism in the preceding decade, it was reasonable to suspect Islamist terrorists. A scientist would ...more
11%
Flag icon
In fact, in science, the best evidence that a hypothesis is true is often an experiment designed to prove the hypothesis is false, but which fails to do so. Scientists must be able to answer the question “What would convince me I am wrong?” If they can’t, it’s a sign they have grown too attached to their beliefs.
11%
Flag icon
That was pure confirmation bias: “If the patient is cured, it is evidence my treatment works; if the patient dies, it means nothing.” This is a poor way to build an accurate mental model of a complicated world, but it’s a superb way to satisfy the brain’s desire for order because it yields tidy explanations with no loose ends. Everything
12%
Flag icon
Archie Cochrane’s skeptical defenses folded because Cochrane found the specialist’s story as intuitively compelling as the specialist did. But another mental process was likely at work, as well. Formally, it’s called attribute substitution, but I call it bait and switch: when faced with a hard question, we often surreptitiously replace it with an easy one. “Should I worry about the shadow in the long grass?” is a hard question. Without more data, it may be unanswerable. So we substitute an easier question: “Can I easily recall a lion attacking someone from the long grass?” That question ...more
12%
Flag icon
So the availability heuristic—like Kahneman’s other heuristics—is essentially a bait-and-switch maneuver. And just as the availability heuristic is usually an unconscious System 1 activity, so too is bait and switch.
12%
Flag icon
Of course we aren’t always oblivious to the machinations of our minds. If someone asks about climate change, we may say, “I have no training in climatology and haven’t read any of the science. If I tried to answer based on what I know I’d make a mess of it. The knowledgeable people are the climatologists. So I’ll substitute ‘Do most climatologists think climate change is real?’ for ‘Is climate change real?’” An ordinary person told by an eminent cancer specialist that she has terminal cancer may engage in the same conscious bait and switch and just accept what the doctor says as true.
16%
Flag icon
“we believe that the extent of [Eastern European] military and propaganda preparations indicates that an attack on Yugoslavia in 1951 should be considered a serious possibility.” By most standards, that is clear, meaningful language. No one suggested otherwise when the estimate was published and read by top officials throughout the government. But a few days later, Kent was chatting with a senior State Department official who casually asked, “By the way, what did you people mean by the expression ‘serious possibility’? What kind of odds did you have in mind?” Kent said he was pessimistic. He ...more
16%
Flag icon
Disturbed, Kent went back to his team. They had all agreed to use “serious possibility” in the NIE so Kent asked each person, in turn, what he thought it meant. One analyst said it meant odds of about 80 to 20, or four times more likely than not that there would be an invasion. Another thought it meant odds of 20 to 80—exactly the opposite. Other answers were scattered between those extremes.
16%
Flag icon
So if the National Intelligence Estimate said something is “probable,” it would mean a 63% to 87% chance it would happen. Kent’s scheme was simple—and it greatly reduced the room for confusion. But it was never adopted. People liked clarity and precision in principle but when it came time to make clear and precise forecasts they weren’t so keen on numbers. Some said it felt unnatural or awkward, which it does when you’ve spent a lifetime using vague language, but that’s a weak argument against change.
17%
Flag icon
Of course we’re not omnipotent beings, so we can’t rerun the day—and we can’t judge. But people do judge. And they always judge the same way: they look at which side of “maybe”—50%—the probability was on. If the forecast said there was a 70% chance of rain and it rains, people think the forecast was right; if it doesn’t rain, they think it was wrong.
17%
Flag icon
This simple mistake is extremely common. Even sophisticated thinkers fall for it. In 2012, when the Supreme Court was about to release its long-awaited decision on the constitutionality of Obamacare, prediction markets—markets that let people bet on possible outcomes—pegged the probability of the law being struck down at 75%. When the court upheld the law, the sagacious New York Times reporter David Leonhardt declared that “the market—the wisdom of the crowds—was wrong.”
17%
Flag icon
So what’s the safe thing to do? Stick with elastic language. Forecasters who use “a fair chance” and “a serious possibility” can even make the wrong-side-of-maybe fallacy work for them: If the event happens, “a fair chance” can retroactively be stretched to mean something considerably bigger than 50%—so the forecaster nailed it. If it doesn’t happen, it can be shrunk to something much smaller than 50%—and again the forecaster nailed it. With perverse incentives like these, it’s no wonder people prefer rubbery words over firm numbers.
17%
Flag icon
If we are serious about measuring and improving, this won’t do. Forecasts must have clearly defined terms and timelines. They must use numbers. And one more thing is essential: we must have lots of forecasts.
17%
Flag icon
The figure on top represents perfect calibration but poor resolution. It’s perfect calibration because when the forecaster says there is a 40% chance something will happen, it happens 40% of the time, and when she says there is a 60% chance something will happen, it happens 60% of the time. But it’s poor resolution because the forecaster never strays out of the minor-shades-of-maybe zone between 40% and 60%.
18%
Flag icon
Someone who says there is a 70% chance of X should do fairly well if X happens. But someone who says there is a 90% chance of X should do better. And someone bold enough to correctly predict X with 100% confidence gets top marks. But hubris must be punished. The forecaster who says X is a slam dunk should take a big hit if X does not happen. How big a hit is debatable, but it’s reasonable to think of it in betting terms.
18%
Flag icon
The math behind this system was developed by Glenn W. Brier in 1950, hence results are called Brier scores. In effect, Brier scores measure the distance between what you forecast and what actually happened. So Brier scores are like golf scores: lower is better. Perfection is 0. A hedged fifty-fifty call, or random guessing in the aggregate, will produce a Brier score of 0.5.
18%
Flag icon
A forecast that is wrong to the greatest possible extent—saying there is a 100% chance that something will happen and it doesn’t, every time—scores a disastrous 2.0, as far from The Truth as it is possible to get.
18%
Flag icon
underappreciated point. For example, after the 2012 presidential election, Nate Silver, Princeton’s Sam Wang, and other poll aggregators were hailed for correctly predicting all fifty state outcomes, but almost no one noted that a crude, across-the-board prediction of “no change”—if a state went Democratic or Republican in 2008, it will do the same in 2012—would have scored forty-eight out of fifty, which suggests that the many excited exclamations of “He called all fifty states!” we heard at the time were a tad overwrought.
18%
Flag icon
so comparing the Brier scores of a Phoenix meteorologist with those of a Springfield meteorologist would be unfair. A 0.2 Brier score in Springfield could be a sign that you are a world-class meteorologist.
19%
Flag icon
If you didn’t know the punch line of EPJ before you read this book, you do now: the average expert was roughly as accurate as a dart-throwing chimpanzee. But as students are warned in introductory statistics classes, averages can obscure. Hence the old joke about statisticians sleeping with their feet in an oven and their head in a freezer because the average temperature is comfortable.
19%
Flag icon
The second group beat the chimp, though not by a wide margin, and they still had plenty of reason to be humble. Indeed, they only barely beat simple algorithms like “always predict no change” or “predict the recent rate of change.” Still, however modest their foresight was, they had some. So why did one group do better than the other? It wasn’t whether they had PhDs or access to classified information. Nor was it what they thought—whether they were liberals or conservatives, optimists or pessimists. The critical factor was how they thought.
19%
Flag icon
One group tended to organize their thinking around Big Ideas, although they didn’t agree on which Big Ideas were true or false. Some were environmental doomsters (“We’re running out of everything”); others were cornucopian boomsters (“We can find cost-effective substitutes for everything”). Some were socialists (who favored state control of the commanding heights of the economy); others were free-market fundamentalists (who wanted to minimize regulation). As ideologically diverse as they were, they were united by the fact that their thinking was so ideological.
19%
Flag icon
These experts gathered as much information from as many sources as they could. When thinking, they often shifted mental gears, sprinkling their speech with transition markers such as “however,” “but,” “although,” and “on the other hand.” They talked about possibilities and probabilities, not certainties. And while no one likes to say “I was wrong,” these experts more readily admitted it and changed their minds.
19%
Flag icon
Kudlow was optimistic. “There is no recession,” he wrote. “In fact, we are about to enter the seventh consecutive year of the Bush boom.”19 The National Bureau of Economic Research later designated December 2007 as the official start of the Great Recession of 2007–9. As the months passed, the economy weakened and worries grew, but Kudlow did not budge. There is no recession and there will be no recession, he insisted.
20%
Flag icon
Think of that Big Idea like a pair of glasses that the hedgehog never takes off. The hedgehog sees everything through those glasses. And they aren’t ordinary glasses. They’re green-tinted glasses—like the glasses that visitors to the Emerald City were required to wear in L. Frank Baum’s The Wonderful Wizard of Oz. Now, wearing green-tinted glasses may sometimes be helpful, in that they accentuate something real that might otherwise be overlooked. Maybe there is just a trace of green in a tablecloth that a naked eye might miss, or a subtle shade of green in running water. But far more often, ...more
20%
Flag icon
It may increase the hedgehog’s confidence, but not his accuracy. That’s a bad combination. The predictable result? When hedgehogs in the EPJ research made forecasts on the subjects they knew the most about—their own specialties—their accuracy declined.
20%
Flag icon
Not that being wrong hurt Kudlow’s career. In January 2009, with the American economy in a crisis worse than any since the Great Depression, Kudlow’s new show, The Kudlow Report, premiered on CNBC. That too is consistent with the EPJ data, which revealed an inverse correlation between fame and accuracy: the more famous an expert was, the less accurate he was. That’s not because editors, producers, and the public go looking for bad forecasters. They go looking for hedgehogs,
20%
Flag icon
For many audiences, that’s satisfying. People tend to find uncertainty disturbing and “maybe” underscores uncertainty with a bright red crayon. The simplicity and confidence of the hedgehog impairs foresight, but it calms nerves—which is good for the careers of hedgehogs.
20%
Flag icon
Some reverently call it the miracle of aggregation but it is easy to demystify. The key is recognizing that useful information is often dispersed widely, with one person possessing a scrap, another holding a more important piece, a third having a few bits, and so on. When Galton
20%
Flag icon
When a butcher looked at the ox, he contributed the information he possessed thanks to years of training and experience. When a man who regularly bought meat at the butcher’s store made his guess, he added a little more. A third person, who remembered how much the ox weighed at last year’s fair, did the same. And so it went. Hundreds of people added valid information, creating a collective pool far greater than any one of them possessed.
20%
Flag icon
Of course they also contributed myths and mistakes, creating a pool of misleading clues as big as the pool of useful clues. But there was an important difference between the two pools. All the valid information pointed in one direction—toward 1,198 pounds—but the errors had different sources and pointed in different directions.
20%
Flag icon
How well aggregation works depends on what you are aggregating. Aggregating the judgments of many people who know nothing produces a lot of nothing. Aggregating the judgments of people who know a little is better, and if there are enough of them, it can produce impressive results, but aggregating the judgments of an equal number of people who know lots about lots of different things is most effective because the collective pool of information becomes much bigger.
21%
Flag icon
And that’s just what happened—33 and 22 were the most popular answers. And because I did not think about the problem from this different perspective, and factor it into my own judgment, I was wrong. What I should have done is look at the problem from both perspectives—the perspectives of both logic and psycho-logic—and combine what I saw. And this merging of perspectives need not be limited to two.
21%
Flag icon
Like us, dragonflies have two eyes, but theirs are constructed very differently. Each eye is an enormous, bulging sphere, the surface of which is covered with tiny lenses. Depending on the species, there may be as many as thirty thousand of these lenses on a single eye, each one occupying a physical space slightly different from those of the adjacent lenses, giving it a unique perspective. Information from these thousands of unique perspectives flows into the dragonfly’s brain where it is synthesized into vision so superb that the dragonfly can see in almost every direction simultaneously, ...more
22%
Flag icon
Why did you assume that an opponent who raises the bet has a strong hand if you would not raise with the same strong hand? “And it’s not until I walk them through the exercise,” Duke says, that people realize they failed to truly look at the table from the perspective of their opponent.
22%
Flag icon
“these are people who have played enough poker, and are passionate about the game, and consider themselves good enough, that they’re paying a thousand dollars for a seminar with me,” Duke says. “And they don’t understand this basic concept.”
22%
Flag icon
But remember the old reflexivity-paradox joke. There are two types of people in the world: those who think there are two types and those who don’t. I’m of the second type. My fox/hedgehog model is not a dichotomy. It is a spectrum.
« Prev 1 3 4