More on this book
Community
Kindle Notes & Highlights
Read between
June 26 - June 27, 2022
Bill is not alone. There are thousands of others answering the same questions. All are volunteers. Most aren’t as good as Bill, but about 2% are. They include engineers and lawyers, artists and scientists, Wall Streeters and Main Streeters, professors and students. We will meet many of them, including a mathematician, a filmmaker, and some retirees eager to share their underused talents. I call them superforecasters because that is what they are. Reliable evidence proves it. Explaining why they’re so good, and how others can learn to do what they do, is my goal in this book.
it turns out that forecasting is not a “you have it or you don’t” talent. It is a skill that can be cultivated. This book will show you how.
I believe it is possible to see into the future, at least in some situations and to some extent, and that any intelligent, open-minded, and hardworking person can cultivate the requisite skills. Call me an “optimistic skeptic.”
How predictable something is depends on what we are trying to predict, how far into the future, and under what circumstances.
Meteorologists know that better than anyone. They make large numbers of forecasts and routinely check their accuracy—which is why we know that one- and two-day forecasts are typically quite accurate while eight-day forecasts are not. With these analyses, meteorologists are able to sharpen their understanding of how weather works and tweak their models. Then they try again. Forecast, measure, revise. Repeat. It’s a never-ending process of incremental improvement that explains why weather forecasts are good and slowly getting better. There may be limits to such improvements, however, because
...more
“I have been struck by how important measurement is to improving the human condition,” Bill Gates wrote. “You can achieve incredible progress if you set a clear goal and find a measure that will drive progress toward that goal….This may seem basic, but it is amazing how often it is not done and how hard it is to get right.”8 He
Foresight isn’t a mysterious gift bestowed at birth. It is the product of particular ways of thinking, of gathering information, of updating beliefs. These habits of thought can be learned and cultivated by any intelligent, thoughtful, determined person.
In 1954, a brilliant psychologist, Paul Meehl wrote a small book that caused a big stir.12 It reviewed twenty studies showing that well-informed experts predicting outcomes—whether a student would succeed in college or a parolee would be sent back to prison—were not as accurate as simple algorithms that added up objective indicators like ability test scores and records of past conduct. Meehl’s claim upset many experts, but subsequent research—now more than two hundred studies—has shown that in most cases statistical algorithms beat subjective judgment, and in the handful of studies where they
...more
harshly. It’s human nature. We have all been too quick to make up our minds and too slow to change them. And if we don’t examine how we make these mistakes, we will keep making them.
Experiments are what people do when they aren’t sure what the truth is.
The cure for this plague of certainty came tantalizingly close to discovery in 1747, when a British ship’s doctor named James Lind took twelve sailors suffering from scurvy, divided them into pairs, and gave each pair a different treatment: vinegar, cider, sulfuric acid, seawater, a bark paste, and citrus fruit. It was an experiment born of desperation. Scurvy was a mortal threat to sailors on long-distance voyages and not even the confidence of physicians could hide the futility of their treatments. So Lind took six shots in the dark—and one hit. The two sailors given the citrus recovered
...more
Not until the twentieth century did the idea of randomized trial experiments, careful measurement, and statistical power take hold.
It was cargo cult science, a term of mockery coined much later by the physicist Richard Feynman to describe what happened after American airbases from World War II were removed from remote South Pacific islands, ending the islanders’ only contact with the outside world. The planes had brought wondrous goods. The islanders wanted more. So they “arranged to make things like runways, to put fires along the sides of the runways, to make a wooden hut for a man to sit in, with two wooden pieces on his head like headphones and bars of bamboo sticking out like antennas—he’s the controller—and they
...more
What medicine lacked was doubt. “Doubt is not a fearful thing,” Feynman observed, “but a thing of very great value.”10 It’s what propels science forward.
What people didn’t grasp is that the only alternative to a controlled experiment that delivers real insight is an uncontrolled experiment that produces merely the illusion of insight.
In describing how we think and decide, modern psychologists often deploy a dual-system model that partitions our mental universe into two domains. System 2 is the familiar realm of conscious thought. It consists of everything we choose to focus on. By contrast, System 1 is largely a stranger to us. It is the realm of automatic perceptual and cognitive operations—like those you are running right now to transform the print on this page into a meaningful sentence or to hold the book while reaching for a glass and taking a sip. We have no awareness of these rapid-fire processes but we could not
...more
Such scientific caution runs against the grain of human nature. As the post-Oslo speculation reveals, our natural inclination is to grab on to the first plausible explanation and happily gather supportive evidence without checking its reliability. That is what psychologists call confirmation bias. We rarely seek out evidence that undercuts our first explanation, and when that evidence is shoved under our noses we become motivated skeptics—finding reasons, however tenuous, to belittle it or throw it out entirely.
“It is wise to take admissions of uncertainty seriously,” Daniel Kahneman noted, “but declarations of high confidence mainly tell you that an individual has constructed a coherent story in his mind, not necessarily that the story is true.”17
Formally, it’s called attribute substitution, but I call it bait and switch: when faced with a hard question, we often surreptitiously replace it with an easy one. “Should I worry about the shadow in the long grass?” is a hard question. Without more data, it may be unanswerable. So we substitute an easier question: “Can I easily recall a lion attacking someone from the long grass?” That question becomes a proxy for the original question and if the answer is yes to the second question, the answer to the first also becomes yes.
While Daniel Kahneman and Amos Tversky were documenting System 1’s failings, another psychologist, Gary Klein, was examining decision making among professionals like the commanders of firefighting teams, and discovering that snap judgments can work astonishingly well. One commander told Klein about going to a routine kitchen fire and ordering his men to stand in the living room and hose down the flames. The fire subsided at first but roared back. The commander was baffled. He also noticed the living room was surprisingly hot given the size of the kitchen fire. And why was it so quiet? A fire
...more
Drawing such seemingly different conclusions about snap judgments, Kahneman and Klein could have hunkered down and fired off rival polemics. But, like good scientists, they got together to solve the puzzle. “We agree on most of the issues that matter,” they concluded in a 2009 paper.19
There is nothing mystical about an accurate intuition like the fire commander’s. ...
This highlight has been truncated due to consecutive passage length restrictions.
All too often, forecasting in the twenty-first century looks too much like nineteenth-century medicine. There are theories, assertions, and arguments. There are famous figures, as confident as they are well compensated. But there is little experimentation, or anything that could be called science, so we know much less than most people realize. And we pay the price. Although bad forecasting rarely leads as obviously to harm as does bad medicine, it steers us subtly toward bad decisions and all that flows from them—including monetary losses, missed opportunities, unnecessary suffering, even war
...more
The first step in learning what works in forecasting, and what doesn’t, is to judge forecasts, and to do that we can’t make assumptions about what the forecast means. We have to know. There can’t be any ambiguity about whether a forecast is accurate or not and Ballmer’s forecast is ambiguous. Sure, it looks wrong. It feels wrong. There is a strong case to be made that it is wrong. But is it wrong beyond all reasonable doubt?
In intelligence circles, Sherman Kent is a legend. With a PhD in history, Kent left a faculty position at Yale to join the Research and Analysis Branch of the newly created Coordinator of Information (COI) in 1941. The COI became the Office of Strategic Services (OSS). The OSS became the Central Intelligence Agency (CIA). By the time Kent retired from the CIA in 1967, he had profoundly shaped how the American intelligence community does what it calls intelligence analysis—the methodical examination of the information collected by spies and surveillance to figure out what it means, and what
...more
The key word in Kent’s work is estimate. As Kent wrote, “estimating is what you do when you do not know.”9 And as Kent emphasized over and over, we never truly know what will happen next. Hence forecasting is all about estimating the likelihood of something happening, which Kent and his colleagues did for many years at the Office of National Estimates—an obscure but extraordinarily influential bureau whose job was to draw on all information available to the CIA, synthesize it, and forecast anything and everything that might help the top officeholders in the US government decide what to do
...more
Forecasters who practice get better at distinguishing finer degrees of uncertainty, just as artists get better at distinguishing subtler shades of gray.
But people do judge. And they always judge the same way: they look at which side of “maybe”—50%—the probability was on. If the forecast said there was a 70% chance of rain and it rains, people think the forecast was right; if it doesn’t rain, they think it was wrong.
Kent couldn’t overcome such political barriers, but as the years passed the case he made for using numbers only grew. Study after study showed that people attach very different meanings to probabilistic language like “could,” “might,” and “likely.” Still the intelligence community resisted. Only after the debacle over Saddam Hussein’s supposed weapons of mass destruction, and the wholesale reforms that followed, did it become more acceptable to express probabilities with numbers. When CIA analysts told President Obama they were “70%” or “90%” sure the mystery man in a Pakistani compound was
...more
When we combine calibration and resolution, we get a scoring system that fully captures our sense of what good forecasters should do.
The math behind this system was developed by Glenn W. Brier in 1950, hence results are called Brier scores. In effect, Brier scores measure the distance between what you forecast and what actually happened. So Brier scores are like golf scores: lower is better. Perfection is 0. A hedged fifty-fifty call, or random guessing in the aggregate, will produce a Brier score of 0.5. A forecast that is wrong to the greatest possible extent—saying there is a 100% chance that something will happen and it doesn’t, every time—scores a disastrous 2.0, as far from The Truth as it is possible to get.17
Add it all up and we are ready to roll. Like Archie Cochrane and other pioneers of evidence-based medicine, we must run carefully crafted experiments. Assemble forecasters. Ask them large numbers of questions with precise time frames and unambiguous language. Require that forecasts be expressed using numerical probability scales. And wait for time to pass. If the researchers have done their jobs, the results will be clear. The data can be analyzed and the key questions—How good are the forecasters? Who are the best? What sets them apart?—can be answered.
Decades ago, the philosopher Isaiah Berlin wrote a much-acclaimed but rarely read essay that compared the styles of thinking of great authors through the ages. To organize his observations, he drew on a scrap of 2,500-year-old Greek poetry attributed to the warrior-poet Archilochus: “The fox knows many things but the hedgehog knows one big thing.” No one will ever know whether Archilochus was on the side of the fox or the hedgehog but Berlin favored foxes. I felt no need to take sides. I just liked the metaphor because it captured something deep in my data. I dubbed the Big Idea experts
...more
People tend to find uncertainty disturbing and “maybe” underscores uncertainty with a bright red crayon. The simplicity and confidence of the hedgehog impairs foresight, but it calms nerves—which is good for the careers of hedgehogs.
Aggregating the judgment of many consistently beats the accuracy of the average member of the group, and is often as startlingly accurate as Galton’s weight-guessers. The collective judgment isn’t always more accurate than any individual guess, however. In fact, in any group there are likely to be individuals who beat the group. But those bull’s-eye guesses typically say more about the power of luck—chimps who throw a lot of darts will get occasional bull’s-eyes—than about the skill of the guesser. That becomes clear when the exercise is repeated many times. There will be individuals who beat
...more
How well aggregation works depends on what you are aggregating. Aggregating the judgments of many people who know nothing produces a lot of nothing. Aggregating the judgments of people who know a little is better, and if there are enough of them, it can produce impressive results, but aggregating the judgments of an equal number of people who know lots about lots of different things is most effective because the collective pool of information becomes much bigger. Aggregations of aggregations can also yield impressive results. A well-conducted opinion survey aggregates a lot of information
...more
Now look at how foxes approach forecasting. They deploy not one analytical idea but many and seek out information not from one source but many. Then they synthesize it all into a single conclusion. In a word, they aggregate. They may be individuals working alone, but what they do is, in principle, no different from what Galton’s crowd did. They integrate perspectives and the information contained within them. The only real difference is that the process occurs within one skull.
So did Robert Jervis, a fact I find more compelling because Jervis has a four-decade track record of insightful, nonpartisan scholarship about intelligence. Jervis is the author of Why Intelligence Fails, which meticulously dissects both the failure of the IC to foresee the Iranian revolution in 1979—Jervis conducted a postmortem for the CIA that was classified for decades—and the false alarm on Saddam Hussein’s WMDs. In the latter case, the IC’s conclusion was sincere, Jervis decided. And it was reasonable.
As I noted in chapter 2, a situation like that tempts us with a bait and switch: replace the tough question with the easy one, answer it, and then sincerely believe that we have answered the tough question.
Good poker players, investors, and executives all understand this. If they don’t, they can’t remain good at what they do—because they will draw false lessons from experience, making their judgment worse over time.
wrong. Absent accuracy metrics, there is no meaningful way to hold intelligence analysts accountable for accuracy.
And as EPJ and other studies have shown, human cognitive systems will never be able to forecast turning points in the lives of individuals or nations several years into the future—and heroic searches for superforecasters won’t change that.
Quit pretending you know things you don’t and start running experiments.
IARPA knew this could happen when it bankrolled the tournament, which is why a decision like that is so unusual. Testing may obviously be in the interest of an organization, but organizations consist of people who have interests of their own, most notably preserving and enhancing a comfortable status quo. Just as famous, well-remunerated pundits are loath to put their reputations at risk by having their accuracy publicly tested, so too are the power players inside organizations unlikely to try forecasting tournaments if it means putting their own judgment to the test. Bob in the CEO suite does
...more
This is not complicated stuff. But it’s easy to misinterpret randomness. We don’t have an intuitive feel for it. Randomness is invisible from the tip-of-your-nose perspective. We can only see it if we step outside ourselves.
The psychologist Ellen Langer has shown how poorly we grasp randomness in a series of experiments. In one, she asked Yale students to watch someone flip a coin thirty times and predict whether it would come up heads or tails. The students could not see the actual flipping but they were told the results of each toss. The results, however, were rigged: all students got a total of fifteen right and fifteen wrong, but some students got a string of hits early while others started with a string of misses. Langer then asked the students how well they thought they would do if the experiment were
...more
Some statistical concepts are both easy to understand and easy to forget. Regression to the mean is one of them. Let’s say the average height of men is five feet eight inches. Now imagine a man who is six feet tall and then picture his adult son. Your initial System 1 hunch may be that the son is also six feet. That’s possible, but unlikely. To see why, we have to engage in some strenuous System 2 reasoning. Imagine that we knew everyone’s height and computed the correlation between the heights of fathers and sons. We would find a strong but imperfect relationship, a correlation of about 0.5,
...more
Ultimately, it’s not the crunching power that counts. It’s how you use it.
The Italian American physicist Enrico Fermi—a central figure in the invention of the atomic bomb—concocted this little brainteaser decades before the invention of the Internet. And Fermi’s students did not have the Chicago yellow pages at hand. They had nothing. And yet Fermi expected them to come up with a reasonably accurate estimate.
Fermi knew people could do much better and the key was to break down the question with more questions like “What would have to be true for this to happen?” Here, we can break the question down by asking, “What information would allow me to answer the question?”
This is similar to Roger Martin's ultimate strategy question: "What would have to be true to this option to be a great choice."

