More on this book
Community
Kindle Notes & Highlights
“We form impressions quickly and hold on to them even when contradictory information comes in. This tendency is called excessive coherence.”
There is a technical but fairly easy way to correct the error of matching predictions; we detail it in appendix C.
For example, people are more reluctant to match predictions to unfavorable than to favorable evidence.
much more sensitive to the relative value of comparable goods than to their absolute value.
The findings of Stevens’s laboratory suggest that the anchor that individuals produce should have a large effect on the absolute values of their subsequent dollar judgments but no effect whatsoever on the relative positions of the ten cases.
When do you feel confident in a judgment? Two conditions must be satisfied: the story you believe must be comprehensively coherent, and there must be no attractive alternatives.
Individual differences in the quality of judgments are another source of pattern noise. Imagine a single forecaster with crystal-ball powers that no one knows about (including herself). Her accuracy would make her deviate in many cases from the average forecast. In the absence of outcome data, these deviations would be regarded as pattern errors. When judgments are unverifiable, superior accuracy will look like pattern noise.
your stable pattern errors are unique to you.
In the conservative spirit of statistical analysis, the residual error is commonly labeled an error term and is treated as random. In other words, the default interpretation of pattern noise is that it consists entirely of occasion noise.
fundamental attribution error
Business school professor Phil Rosenzweig has convincingly argued that empty explanations in terms of biases are common in discussions of business outcomes. Their popularity attests to the prevalent need for causal stories that make sense of experience.
Noise is inherently statistical: it becomes visible only when we think statistically about an ensemble of similar judgments.
If the amount of system noise is worth addressing, replacing judgment with rules or algorithms is an option that you should consider, as it will eliminate noise entirely.
sequencing information. The search for coherence leads people to form early impressions based on the limited evidence available and then to confirm their emerging prejudgment.
Even a wisdom-of-crowds aggregate of judgments is likely to be better if the crowd is composed of more able people.
Judgments are both less noisy and less biased when those who make them are well trained, are more intelligent, and have the right cognitive style.
general mental ability
“GMA predicts both occupational level attained and performance within one’s chosen occupation and does so better than any other ability, trait, or disposition and better than job experience.”
It can be sensible to place your trust in people who look and sound intelligent and who can articulate a compelling rationale for their judgments, but this strategy is insufficient and may even backfire. Are there, then, other ways to identify real experts? Do people with the best judgment have other recognizable traits?
“A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?”
Adult Decision Making Competence
Uriel Haran, Ilana Ritov, and Barbara Mellers looked for the cognitive styles
Jonathan Baron to measure “actively open-minded thinking.”
Interestingly, there is some evidence that actively open-minded thinking is a teachable skill.
A political analyst may sound articulate and convincing, and a chess grandmaster may sound timid and unable to explain the reasoning behind some of his moves. Yet we probably should treat the professional judgment of the former with more skepticism than that of the latter.
The personality of people with excellent judgment may not fit the generally accepted stereotype of a decisive leader. People often tend to trust and like leaders who are firm and clear and who seem to know, immediately and deep in their bones, what is right. Such leaders inspire confidence. But the evidence suggests that if the goal is to reduce error, it is better for leaders (and others) to remain open to counterarguments and to know that they might be wrong. If they end up being decisive, it is at the end of a process, not at the start.
We suggest undertaking this search for biases neither before nor after the decision is made, but in real time.
Of course, people are rarely aware of their own biases when they are being misled by them. This lack of awareness is itself a known bias, the bias blind spot. People often recognize biases more easily in others than they do in themselves. We suggest that observers can be trained to spot, in real time, the diagnostic signs that one or several familiar biases are affecting someone else’s decisions or recommendations.
We certainly do not recommend that you make yourself a self-appointed decision observer. You will neither win friends nor influence people.
To be effective, decision observers need some training and tools. One such tool is a checklist of the biases they are attempting to detect. The case for relying on a checklist is clear: checklists have a long history of improving decisions in high-stakes contexts and are particularly well suited to preventing the repetition of past errors.
aggregating multiple independent estimates.
Averaging is mathematically guaranteed to reduce noise: specifically, it divides it by the square root of the number of judgments averaged. This means that if you average one hundred judgments, you will reduce noise by 90%, and if you average four hundred judgments, you will reduce it by 95%—essentially eliminating it. This statistical law is the engine of the wisdom-of-crowds approach, discussed in chapter 7.
The Delphi method has worked well in many situations, but it can be challenging to implement. A simpler version, mini-Delphi, can be deployed within a single meeting. Also called estimate-talk-estimate, it requires participants first to produce separate (and silent) estimates, then to explain and justify them, and finally to make a new estimate in response to the estimates and explanations of others. The consensus judgment is the average of the individual estimates obtained in that second round.
Consider superforecasters’ willingness and ability to structure and disaggregate problems. Rather than form a holistic judgment about a big geopolitical question (whether a nation will leave the European Union, whether a war will break out in a particular place, whether a public official will be assassinated), they break it up into its component parts. They ask, “What would it take for the answer to be yes? What would it take for the answer to be no?” Instead of offering a gut feeling or some kind of global hunch, they ask and try to answer an assortment of subsidiary questions.
To characterize the thinking style of superforecasters, Tetlock uses the phrase “perpetual beta,” a term used by computer programmers for a program that is not meant to be released in a final version but that is endlessly used, analyzed, and improved. Tetlock finds that “the strongest predictor of rising into the ranks of superforecasters is perpetual beta, the degree to which one is committed to belief updating and self-improvement.” As he puts it, “What makes them so good is less what they are than what they do—the hard work of research, the careful thought and self-criticism, the gathering
...more
“Teaming—unlike training… allows forecasters to harness the information.”
The success of the superforecasting project highlights the value of two decision hygiene strategies: selection (the superforecasters are, well, super) and aggregation (when they work in teams, forecasters perform better). The two strategies are broadly applicable in many judgments. Whenever possible, you should aim to combine the strategies, by constructing teams of judges (e.g., forecasters, investment professionals, recruiting officers) who are selected for being both good at what they do and complementary to one another.
When there is noise, one physician may be clearly right and the other may be clearly wrong (and may suffer from some kind of bias). As might be expected, skill matters a lot. A study of pneumonia diagnoses by radiologists, for instance, found significant noise. Much of it came from differences in skill. More specifically, “variation in skill can explain 44% of the variation in diagnostic decisions,” suggesting that “policies that improve skill perform better than uniform decision guidelines.” Here as elsewhere, training and selection are evidently crucial to the reduction of error, and to the
...more
An early study found that 31% of the time, physicians evaluating angiograms disagreed on whether a major vessel was more than 70% blocked. Despite
A 1964 study involving 91 patients and ten experienced psychiatrists found that the likelihood of an agreement between two opinions was just 57%. Another early study, involving 426 state hospital patients diagnosed independently by two psychiatrists, found agreement merely 50% of the time in their diagnosis of the kind of mental illness that was present. Yet another early study, involving 153 outpatients, found 54% agreement. In these studies, the source of the noise was not specified. Interestingly, however, some psychiatrists were found to be inclined to assign patients to specific
...more
Field trials for DSM-5 found “minimal agreement,” which “means that highly trained specialist psychiatrists under study conditions were only able to agree that a patient has depression between 4 and 15% of the time.” According to some field trials, DSM-5 actually made things worse, showing increased noise “in all major domains, with some diagnoses, such as mixed anxiety-depressive disorder… so unreliable as to appear useless in clinical practice.”
when “clinicians agree on the presence or absence of symptoms, they are more likely to agree on the diagnosis
There is a good chance that on some of the ratings, you and the other rater came up with different numbers. If you (and your counterpart) are willing, please discuss the reasons for the differences. You might find that the answer lies in how you used the scale—what we have called level noise. Perhaps you thought a 5 requires something truly extraordinary, whereas the other rater thought that it merely requires something unusually good.
Many executives object to the notion that nearly all employees can meet expectations. If so, they argue, the expectations must be too low, perhaps because of a culture of complacency. Admittedly this interpretation may be valid, but it is also possible that most employees really do meet high expectations. Indeed, this is exactly what we would expect to find in a high-performance organization. You would not sneer at the leniency of the National Aeronautics and Space Administration’s performance management procedures if you heard that all the astronauts on a successful space mission have fully
...more
if all you know about two candidates is that one appeared better than the other in the interview, the chances that this candidate will indeed perform better are about 56 to 61%. Somewhat better than flipping a coin, for sure, but hardly a fail-safe way to make important decisions.
This means that you and another interviewer, after seeing the same two candidates in the same panel interview, will still disagree about which of two candidates is better about one-quarter of the time.
They are likely to ask questions that confirm an initial impression. If a candidate seems shy and reserved, for instance, the interviewer may want to ask tough questions about the candidate’s past experiences of working in teams but perhaps will neglect to ask the same questions of someone who seems cheerful and gregarious.
They then asked some of the interviewees to answer questions randomly. (The first letter of the questions as formulated determined if they should answer yes or no.) As the researchers wryly note, “Some of the interviewees were initially concerned that the random interview would break down and be revealed to be nonsense. No such problems occurred, and the interviews proceeded.” You read that right: not a single interviewer realized that the candidates were giving random answers. Worse, when asked to estimate whether they were “able to infer a lot about this person given the amount of time we
...more
relative judgments are better than absolute ones.