More on this book
Community
Kindle Notes & Highlights
In fact, clinical judgment is the process that we have described simply as judgment in this book.
In the present study, it yields an optimal correlation of .32 (PC = 60%), far from impressive but substantially higher than what clinical predictions achieved. This technique, called multiple regression, produces a predictive score that is a weighted average of the predictors.
Meehl discovered that clinicians and other professionals are distressingly weak in what they often see as their unique strength: the ability to integrate information.
Meehl’s results strongly suggest that any satisfaction you felt with the quality of your judgment was an illusion: the illusion of validity.
The illusion of validity is found wherever predictive judgments are made, because of a common failure to distinguish between two stages of the prediction task: evaluating cases on the evidence available and predicting actual outcomes.
The reason is straightforward: you know most of what you need to know to assess the two cases, but gazing into the future is deeply uncertain.
To support this conclusion, we turn to a different stream of research on simple models, which began in the small city of Eugene, Oregon.
For most of us, the activity of judgment is complex, rich, and interesting precisely because it does not fit simple rules.
Complexity and richness do not generally lead to more accurate predictions.
The robust finding that the model of the judge is more valid than the judge conveys an important message: the gains from subtle rules in human judgment—when they exist—are generally not sufficient to compensate for the detrimental effects of noise.
“More than sixty years after the publication of Paul Meehl’s book, the idea that mechanical prediction is superior to people is still shocking.”
Robyn Dawes was another member of the Eugene, Oregon, team of stars that studied judgment in the 1960s and 1970s.
Dawes labeled the equal-weight formula an improper linear model.
Equal-weight models do well because they are not susceptible to accidents of sampling.
The immediate implication of Dawes’s work deserves to be widely known: you can make valid statistical predictions without prior data about the outcome that you are trying to predict. All you need is a collection of predictors that you can trust to be correlated with the outcome.
Frugal models are models of reality that look like ridiculously simplified, back-of-the-envelope calculations.
While we must admire the power of machine learning, we should remember that it will probably take some time for an AI to understand why a person who has broken a leg will miss movie night.
This finding—that the algorithm picks up rare but decisive patterns—brings us back to the concept of broken legs.
While this sort of discrimination is certainly a risk in principle, the decisions of this algorithm are in important respects less racially biased than those of the judges, not more.
First, as described in chapter 9, all mechanical prediction techniques, not just the most recent and more sophisticated ones, represent significant improvements on human judgment.
Second, the data is sometimes rich enough for sophisticated AI techniques to detect valid patterns and go well beyond the predictive power of a simple model.
The authors concluded that the resistance of clinicians can be explained by a combination of sociopsychological factors, including their “fear of technological unemployment,” “poor education,” and a “general dislike of computers.”
More often, people are willing to give an algorithm a chance but stop trusting it as soon as they see that it makes mistakes.
We expect machines to be perfect. If this expectation is violated, we discard them.
Some of the executives in our audiences tell us proudly that they trust their gut more than any amount of analysis.
essentially ‘knowing’ but without knowing why.” We
right. All the pieces of the jigsaw puzzle seem to fit. (We will see later that this sense of coherence is often bolstered by hiding or ignoring pieces of evidence that don’t fit.)
We take a terminological liberty here, replacing the commonly used uncertainty with ignorance. This term helps limit the risk of confusion between uncertainty, which is about the world and the future, and noise, which is variability in judgments that should be identical.
Tetlock’s key finding was that in their predictions about major political events, the supposed experts are stunningly unimpressive.
The limit on expert political judgment is set not by the cognitive limitation of forecasters but by their intractable objective ignorance of the future.
Models are consistently better than people, but not much better. There is essentially no evidence of situations in which people do very poorly and models do very well with the same information.
However, the obviousness of this fact is matched only by the regularity with which it is ignored, as the consistent findings about predictive overconfidence demonstrate.
Instead, as the previous chapter suggested, we maintain an unchastened willingness to make bold predictions about the future from little useful information.
Aggregate measures are widely known to be both more predictive and more predictable than are measures of single outcomes.
People might agree that environmental regulators should issue prudent rules to reduce greenhouse gas emissions, without defining what constitutes prudence.
Setting standards without specifying details can lead to noise, which might be controlled through some of the strategies we have discussed, such as aggregating judgments and using the mediating assessments protocol.
In criminal law, jury nullification refers to situations in which juries simply refuse to follow the law, on the ground that it is senselessly rigid and harsh.
We need to monitor our rules to make sure they are operating as intended. If they are not, the existence of noise might be a clue, and the rules should be revised.
“We often use standards when we should embrace rules—simply because we don’t pay attention to noise.”
Some judgments are predictive, and some predictive judgments are verifiable; we will eventually know whether they were accurate.
Bias is the average error, as, for example, when a team of shooters consistently hits below and to the left of the target;
They are noise. Noise is variability in judgments that should be identical. We use the term system noise for the noise observed in organizations that employ interchangeable professionals to make decisions, such as physicians in an emergency room, judges imposing criminal penalties, and underwriters in an insurance company.
When the bias is smaller than one standard deviation, noise is the bigger source of overall error.
Hence our conclusion: wherever there is judgment, there is noise, and more of it than you think.
In short, we can be sure that there is error if judgments vary for no good reason.
System noise can be broken down into level noise and pattern noise.
People’s exaggerated confidence in their predictive judgment underestimates their objective ignorance as well as their biases.
The shared intuitive process of matching across intensity dimensions—such as when people match a high GPA to a precocious reading age—will generally produce similar judgments.
Large individual differences emerge when a judgment requires the weighting of multiple, conflicting cues.
Such bias-based explanations are satisfying, because the human mind craves causal explanations.