Ed Carmichael ’s Kindle Notes & Highlights for Superforecasting: The Art and Science of Prediction

Rate it:

More on this book

Community

Tim Moore

1 note & 35 highlights

Ben Hodgkinson

1 note

Niladri

1 note & 44 highlights

Wally Bock

4 notes & 111 highlights

Tim Murphy

1 note & 3 highlights

Clo Willaerts

2 notes & 23 highlights

Rnicholson95

1 note & 171 highlights

Mark Frauenfelder

1 note & 45 highlights

Rori Rockman

Rocky Sunico

Audrey L.P.

Jordan Andrew Bridgers

Ryan

Clarke Smith

Ned Holt

Scott Biggerstaff

Bennett

Rajkumar Dhanasekaran

Sébastien Fournier

Stewart Morris

Lauri Antalainen

Liam Delahunty

Ray Velez

Meg McLaren

David

Harald G.

Alex

Monna Nordhagen

Azriel

Daniel

Sachit Jain

Kaspar Papli

Kate

Megan Stewart

Raluca I

Rick Dronkers

Max Oss-Emer

Eric Castro

Jonathan

Ian Pitchford

Casi Graddy-Gamel

Kindle Notes & Highlights

by Ed Carmichael

See all Ed’s Notes & Highlights

Superforecasting: The Art and Science of Prediction

by Philip E. Tetlock

Read between May 11 - August 26, 2020

18%

So if the National Intelligence Estimate said something is “probable,” it would mean a 63% to 87% chance it would happen. Kent’s scheme was simple—and it greatly reduced the room for confusion.

Brilliant- forcing consensus on what certain terms mean

18%

Language has its own poetry, they felt, and it’s tacky to talk explicitly about numerical odds. It makes you sound like a bookie. Kent wasn’t impressed. “I’d rather be a bookie than a goddamn poet,” was his legendary response.

18%

another benefit: vague thoughts are easily expressed with vague language but when forecasters are forced to translate terms like “serious possibility” into numbers, they have to think carefully about how they are thinking, a process known as metacognition.

18%

But a more fundamental obstacle to adopting numbers relates to accountability and what I call the wrong-side-of-maybe fallacy.

18%

But people do judge. And they always judge the same way: they look at which side of “maybe”—50%—the probability was on. If the forecast said there was a 70% chance of rain and it rains, people think the forecast was right; if it doesn’t rain, they think it was wrong. This simple mistake is extremely common.

I made this mistake in the 2016 US presidential elections. Just because the polls gave trump a 10% chance of winning doesn’t means they were wrong

19%

So what’s the safe thing to do? Stick with elastic language. Forecasters who use “a fair chance” and “a serious possibility” can even make the wrong-side-of-maybe fallacy work for them: If the event happens, “a fair chance” can retroactively be stretched to mean something considerably bigger than 50%—so the forecaster nailed it. If it doesn’t happen, it can be shrunk to something much smaller than 50%—and again the forecaster nailed it. With perverse incentives like these, it’s no wonder people prefer rubbery words over firm numbers.

19%

When CIA analysts told President Obama they were “70%” or “90%” sure the mystery man in a Pakistani compound was Osama bin Laden, it was a small, posthumous triumph for Sherman Kent.

19%

vacuous

Mindless

19%

Almost anything “may” happen. I can confidently forecast that the Earth may be attacked by aliens tomorrow. And if it isn’t? I’m not wrong. Every “may” is accompanied by an asterisk and the words “or may not” are buried in the fine print. But the interviewer didn’t notice the fine print in Ferguson’s forecast, so he did not ask him to clarify.

Make it a practice to ask people to quantity their predictions

19%

We cannot rerun history so we cannot judge one probabilistic forecast—but everything changes when we have many probabilistic forecasts. If a meteorologist says there is a 70% chance of rain tomorrow, that forecast cannot be judged, but if she predicts the weather tomorrow, and the day after, and the day after that, for months, her forecasts can be tabulated and her track record determined. If her forecasting is perfect, rain happens 70% of the time when she says there is 70% chance of rain, 30% of the time when she says there is 30% chance of rain, and so on. This is called calibration. It can ...more

The right way to judge predictions/forecasts

19%

If the meteorologist’s curve is far above the line, she is underconfident—so things she says are 20% likely actually happen 50% of the time (see this page). If her curve is far under the line, she is overconfident—so things she says are 80% likely actually happen only 50% of the time (see this page).

19%

Two ways to be miscalibrated: underconfident (over the line) and overconfident (under the line)

It’s not better to be underconfident- it’s still just as wrong as overconfidence

20%

The math behind this system was developed by Glenn W. Brier in 1950, hence results are called Brier scores.

20%

So Brier scores are like golf scores: lower is better.

20%

A forecast that is wrong to the greatest possible extent—saying there is a 100% chance that something will happen and it doesn’t, every time—scores a disastrous 2.0, as far from The Truth as it is possible to get.

Brier scores for forecasting range from 0 - good - to 2 - dead wrong

20%

For example, after the 2012 presidential election, Nate Silver, Princeton’s Sam Wang, and other poll aggregators were hailed for correctly predicting all fifty state outcomes, but almost no one noted that a crude, across-the-board prediction of “no change”—if a state went Democratic or Republican in 2008, it will do the same in 2012—would have scored forty-eight out of fifty, which suggests that the many excited exclamations of “He called all fifty states!” we heard at the time were a tad overwrought.

21%

If you didn’t know the punch line of EPJ before you read this book, you do now: the average expert was roughly as accurate as a dart-throwing chimpanzee.

21%

Hence the old joke about statisticians sleeping with their feet in an oven and their head in a freezer because the average temperature is comfortable.

21%

As a result, they were unusually confident and likelier to declare things “impossible” or “certain.”

What bad forecasters do

21%

These experts gathered as much information from as many sources as they could. When thinking, they often shifted mental gears, sprinkling their speech with transition markers such as “however,” “but,” “although,” and “on the other hand.” They talked about possibilities and probabilities, not certainties. And while no one likes to say “I was wrong,” these experts more readily admitted it and changed their minds.

What good forecasters do

22%

Foxes beat hedgehogs on both calibration and resolution.

The distinction being that calibration requires you to be generally right more often than not and resolution means you make a bold prediction - closer to a certainty and you are right

22%

but he got his start as an economist in the Reagan administration and later worked with Art Laffer, the economist whose theories were the cornerstone of Ronald Reagan’s economic policies. Kudlow’s one Big Idea is supply-side economics. When President George W. Bush followed

22%

He dubbed it “the Bush boom.” Reality fell short: growth and job creation were positive but somewhat disappointing relative to the long-term average and particularly in comparison to that of the Clinton era, which began with a substantial tax hike.

Tax cuts don’t necessarily lead to an economic boom - just compare Clinton v bush

22%

They’re green-tinted glasses—like the glasses that visitors to the Emerald City were required to wear in L. Frank Baum’s The Wonderful Wizard of Oz.

22%

revealed an inverse correlation between fame and accuracy: the more famous an expert was, the less accurate he was.

22%

And so, as EPJ showed, hedgehogs are likelier to say something definitely will or won’t happen. For many audiences, that’s satisfying.

Be VERY wary of media experts for this reason

22%

British scientist Sir Francis Galton

22%

Their average guess—their collective judgment—was 1,197 pounds, one pound short of the correct answer, 1,198 pounds. It was the earliest demonstration of a phenomenon popularized by—and now named for—James Surowiecki’s bestseller The Wisdom of Crowds.

23%

Hundreds of people added valid information, creating a collective pool far greater than any one of them possessed.

23%

Some suggested the correct answer was higher, some lower. So they canceled each other out. With valid information piling up and errors nullifying themselves, the net result was an astonishingly accurate estimate.

The wisdom of crowds works because valid information converges wile errors amongst groups diverge and cancel each other out

23%

And I happen to be one of those highly educated people who is familiar with game theory, so I know 0 is called the Nash equilibrium solution. QED. The only question is who will come with me to London.

Quot erad demonstradum = Latin for this is my final proof; point proven

23%

And because I did not think about the problem from this different perspective, and factor it into my own judgment, I was wrong.

24%

Depending on the species, there may be as many as thirty thousand of these lenses on a single eye, each one occupying a physical space slightly different from those of the adjacent lenses, giving it a unique perspective. Information from these thousands of unique perspectives flows into the dragonfly’s brain where it is synthesized into vision so superb that the dragonfly can see in almost every direction simultaneously, with the clarity and precision it needs to pick off flying insects at high speed.

24%

Why did you assume that an opponent who raises the bet has a strong hand if you would not raise with the same strong hand? “And it’s not until I walk them through the exercise,” Duke says, that people realize they failed to truly look at the table from the perspective of their opponent.

25%

“All models are wrong,” the statistician George Box observed, “but some are useful.”

George Box’s point is that no model can accurately capture reality in all its messiness, but they CAN form a useful approximation

25%

The United States had invaded Afghanistan to oust the Taliban, who had harbored Osama bin Laden.

So the purpose of invading Afghanistan was to root out the taliban

25%

National Intelligence Estimates are the consensus view of the Central Intelligence Agency, the National Security Agency, the Defense Intelligence Agency, and thirteen other agencies. Collectively, these agencies are known as the intelligence community, or IC.

25%

3And this fantastically elaborate, expensive, and experienced intelligence apparatus concluded in October 2002 that the key claims of the Bush administration about Iraqi WMDs were correct.

So even the American intelligence community - the supposedly best and brightest from the CIA and the NSA - were wrong about Iraq having WMDs

25%

Even caustic critics of the Bush administration—people like Tom Friedman, who derisively called them “Bushies”—were convinced Saddam Hussein was hiding something, somewhere.

Even Tom Friedman - an expert on the Middle East - got this wrong it sounds like

25%

What went wrong? One explanation was that the IC had caved to White House bullying. The intelligence had been politicized. But official investigations rejected that claim.

So the intelligence community weren’t pandering to the White House - they genuinely believed that Iraq had WMDs

25%

Why Intelligence Fails,

May be worth reading

25%

So the question “Was the IC’s judgment reasonable?” is challenging. But it’s a snap to answer “Was the IC’s judgment correct?” As I noted in chapter 2, a situation like that tempts us with a bait and switch: replace the tough question with the easy one, answer it, and then sincerely believe that we have answered the tough question.

The invasion of Iraq may NOT have been a bad decision in this light - it ma have been reasonable given the evidence that was available

26%

Jervis did not let the intelligence community off the hook. “There were not only errors but correctable ones,” he wrote about the IC’s analysis. “Analysis could and should have been better.”

Jerks concluded that better analysis could have and should have been done pertaining to Iraq’s purported manufacture of WMDs

26%

So the IC still would have concluded that Saddam had WMDs, they just would have been much less confident in that conclusion. That may sound like a gentle criticism. In fact, it’s devastating, because a less-confident conclusion from the IC may have made a huge difference:

26%

At a White House briefing on December 12, 2002, the CIA director, George Tenet, used the phrase “slam dunk.”

26%

Postmortems even revealed that the IC had never seriously explored the idea that it could be wrong. “There were no ‘red teams’ to attack the prevailing views, no analyses from devil’s advocates, no papers that provided competing possibilities,”

26%

the 1979 failure to foresee the Iranian revolution—the biggest geopolitical disaster of that era—“

26%

DARPA’s work even contributed to the invention of the Internet.

28%

Think how shocking it would be to the intelligence professionals who have spent their lives forecasting geopolitical events—to be beaten by a few hundred ordinary people and some simple algorithms. It actually happened.

28%

Even the extremizing tweak is based on a pretty simple insight: When you combine the judgments of a large group of people to calculate the “wisdom of the crowd” you collect all the relevant information that is dispersed among all those people. But none of those people has access to all that information. One person knows only some of it, another knows some more, and so on. What would happen if every one of those people were given all the information? They would become more confident—raising their forecasts closer to 100% or zero. If you then calculated the “wisdom of the crowd” it too would be ...more

« Prev 1 2 3 … 6 Next »