More on this book
Community
Kindle Notes & Highlights
This is a familiar phenomenon in companies and government offices, and it can lead to confidence about, and unanimous support for, a judgment that is quite wrong.
group polarization.
when people speak with one another, they often end up at a more extreme point in line with their original inclinations.
Internal discussions often create greater confidence, greater unity, and greater extremism, frequently in the form of increased enthusiasm.
deliberating juries were far noisier than statistical juries—a clear reflection of social influence noise. Deliberation had the effect of increasing noise.
Deliberating juries experienced a shift toward greater leniency (when the median member was lenient) and a shift toward greater severity (when the median member was severe).
If most people favor a severe punishment, then the group will hear many arguments in favor of severe punishment—and fewer arguments the other way. If group members are listening to one another, they will shift in the direction of the dominant tendency, rendering the group more unified, more confident, and more extreme. And if people care about their reputation within the group, they will shift in the direction of the dominant tendency, which will also produce polarization.
cascades and polarization can lead to wide disparities between groups looking at the same problem.
Since many of the most important decisions in business and government are made after some sort of deliberative process, it is especially important to be alert to this risk. Organizations and their leaders should take steps to control noise in the judgments of their individual members. They should also manage deliberating groups in a way that is likely to reduce noise, not amplify it.
noise is a major factor in the inferiority of human judgment.
the percent concordant (PC),
PC is an immediately intuitive measure of covariation, which is a large advantage, but it is not the standard measure that social scientists use. The standard measure is the correlation coefficient (r), which varies between 0 and 1 when two variables are positively related.
correlation coefficient.
the correlation between two variables is their percentage of shared determinants.
most judgments are made in a state of what we call objective ignorance, because many things on which the future depends can simply not be known.
objective ignorance affects not just our ability to predict events but even our capacity to understand them
clinical judgment.
multiple regression,
produces a predictive score that is a weighted average of the predictors. It finds the optimal set of weights, chosen to maximize the correlation between the composite prediction and the target variable. The optimal weights minimize the MSE (mean squared error) of the predictions
The use of multiple regression is an example of mechanical prediction. There are many kinds of mechanical prediction, ranging from simple rules (“hire anyone who completed high school”) to sophisticated artificial intelligence models. But linear regression models are the most common (they have been called “the workhorse of judgment and decision-making research”). To minimize jargon, we will refer to linear models as simple models.
How good is human judgment, relative to a formula?
Meehl discovered that clinicians and other professionals are distressingly weak in what they often see as their unique strength: the ability to integrate information.
any satisfaction you felt with the quality of your judgment was an illusion: the illusion of validity.
two stages of the prediction task: evaluating cases on the evidence available and predicting actual outcomes.
You can often be quite confident in your assessment of which of two candidates looks better, but guessing which of them will actually be better is an altogether different kettle of fish.
The reason is straightforward: you know most of what you need to know to assess the two cases, but gazing into the future is deeply uncertain.
quants,
Meehl, in addition to his academic career, was a practicing psychoanalyst. A picture of Freud hung in his office. He was a polymath who taught classes not just in psychology but also in philosophy and law and who wrote about metaphysics, religion, political science, and even parapsychology.
Meehl had no ill will toward clinicians—far from it. But as he put it, the evidence for the advantage of the mechanical approach to combining inputs was “massive and consistent.”
The findings support a blunt conclusion: simple models beat humans.
When you thought clinically about Monica and Nathalie, you didn’t apply the same rule to both cases. Indeed, you did not apply any rule at all. The model of the judge is not a realistic description of how a judge actually judges.
Complexity and richness do not generally lead to more accurate predictions. Why is that so?
A statistical model of your judgments cannot possibly add anything to the information they contain. All the model can do is subtract and simplify.
Failing to reproduce your subtle rules will result in a loss of accuracy when your subtlety is valid.
complex rules will often give you only the illusion of validity and in fact harm the quality of your judgments. Some subtleties are valid, but many are not.
The effect of removing noise from your judgments will always be an improvement of your predictive accuracy.
replacing you with a model of you does two things: it eliminates your subtlety, and it eliminates your pattern noise. The robust finding that the model of the judge is more valid than the judge conveys an important message: the gains from subtle rules in human judgment—when they exist—are generally not sufficient to compensate for the detrimental effects of noise.
Why do complex rules of prediction harm accuracy, despite the strong feeling we have that they draw on valid insights? For one thing, many of the complex rules that people invent are not likely to be generally true. But there is another problem: even when the complex rules are valid in principle, they inevitably apply under conditions that are rarely observed.
to put it bluntly, it proved almost impossible in that study to generate a simple model that did worse than the experts did.
Of course, we should not conclude that any model beats any human.
“mindless consistency”)
This quick tour has shown how noise impairs clinical judgment. In predictive judgments, human experts are easily outperformed by simple formulas—models of reality, models of a judge, or even randomly generated models. This finding argues in favor of using noise-free methods:
“People believe they capture complexity and add subtlety when they make judgments. But the complexity and the subtlety are mostly wasted—usually they do not add to the accuracy of simple models.”
all mechanical approaches are noise-free.
improper linear model.
His surprising discovery was that these equal-weight models are about as accurate as “proper” regression models, and far superior to clinical judgments.
multiple regression computes “optimal” weights that minimize squared errors. But multiple regression minimizes error in the original data. The formula therefore adjusts itself to predict every random fluke in the data.
model’s predictive accuracy is its performance in a new sample, called its cross-validated correlation.
The loss of accuracy in cross-validation is worst when the original sample is small, because flukes loom larger in small samples.
As statistician Howard Wainer memorably put it in the subtitle of a scholarly article on the estimation of proper weights, “It Don’t Make No Nevermind.” Or, in Dawes’s words, “we do not need models more precise than our measurements.”