Noise: A Flaw in Human Judgment
Rate it:
Open Preview
Kindle Notes & Highlights
Read between September 16 - November 21, 2021
22%
Flag icon
This is a familiar phenomenon in companies and government offices, and it can lead to confidence about, and unanimous support for, a judgment that is quite wrong.
22%
Flag icon
group polarization.
22%
Flag icon
when people speak with one another, they often end up at a more extreme point in line with their original inclinations.
22%
Flag icon
Internal discussions often create greater confidence, greater unity, and greater extremism, frequently in the form of increased enthusiasm.
23%
Flag icon
deliberating juries were far noisier than statistical juries—a clear reflection of social influence noise. Deliberation had the effect of increasing noise.
23%
Flag icon
Deliberating juries experienced a shift toward greater leniency (when the median member was lenient) and a shift toward greater severity (when the median member was severe).
23%
Flag icon
If most people favor a severe punishment, then the group will hear many arguments in favor of severe punishment—and fewer arguments the other way. If group members are listening to one another, they will shift in the direction of the dominant tendency, rendering the group more unified, more confident, and more extreme. And if people care about their reputation within the group, they will shift in the direction of the dominant tendency, which will also produce polarization.
23%
Flag icon
cascades and polarization can lead to wide disparities between groups looking at the same problem.
23%
Flag icon
Since many of the most important decisions in business and government are made after some sort of deliberative process, it is especially important to be alert to this risk. Organizations and their leaders should take steps to control noise in the judgments of their individual members. They should also manage deliberating groups in a way that is likely to reduce noise, not amplify it.
23%
Flag icon
noise is a major factor in the inferiority of human judgment.
23%
Flag icon
the percent concordant (PC),
23%
Flag icon
PC is an immediately intuitive measure of covariation, which is a large advantage, but it is not the standard measure that social scientists use. The standard measure is the correlation coefficient (r), which varies between 0 and 1 when two variables are positively related.
23%
Flag icon
correlation coefficient.
23%
Flag icon
the correlation between two variables is their percentage of shared determinants.
24%
Flag icon
most judgments are made in a state of what we call objective ignorance, because many things on which the future depends can simply not be known.
24%
Flag icon
objective ignorance affects not just our ability to predict events but even our capacity to understand them
24%
Flag icon
clinical judgment.
24%
Flag icon
multiple regression,
24%
Flag icon
produces a predictive score that is a weighted average of the predictors. It finds the optimal set of weights, chosen to maximize the correlation between the composite prediction and the target variable. The optimal weights minimize the MSE (mean squared error) of the predictions
24%
Flag icon
The use of multiple regression is an example of mechanical prediction. There are many kinds of mechanical prediction, ranging from simple rules (“hire anyone who completed high school”) to sophisticated artificial intelligence models. But linear regression models are the most common (they have been called “the workhorse of judgment and decision-making research”). To minimize jargon, we will refer to linear models as simple models.
24%
Flag icon
How good is human judgment, relative to a formula?
25%
Flag icon
Meehl discovered that clinicians and other professionals are distressingly weak in what they often see as their unique strength: the ability to integrate information.
25%
Flag icon
any satisfaction you felt with the quality of your judgment was an illusion: the illusion of validity.
25%
Flag icon
two stages of the prediction task: evaluating cases on the evidence available and predicting actual outcomes.
25%
Flag icon
You can often be quite confident in your assessment of which of two candidates looks better, but guessing which of them will actually be better is an altogether different kettle of fish.
25%
Flag icon
The reason is straightforward: you know most of what you need to know to assess the two cases, but gazing into the future is deeply uncertain.
25%
Flag icon
quants,
25%
Flag icon
Meehl, in addition to his academic career, was a practicing psychoanalyst. A picture of Freud hung in his office. He was a polymath who taught classes not just in psychology but also in philosophy and law and who wrote about metaphysics, religion, political science, and even parapsychology.
25%
Flag icon
Meehl had no ill will toward clinicians—far from it. But as he put it, the evidence for the advantage of the mechanical approach to combining inputs was “massive and consistent.”
25%
Flag icon
The findings support a blunt conclusion: simple models beat humans.
25%
Flag icon
When you thought clinically about Monica and Nathalie, you didn’t apply the same rule to both cases. Indeed, you did not apply any rule at all. The model of the judge is not a realistic description of how a judge actually judges.
26%
Flag icon
Complexity and richness do not generally lead to more accurate predictions. Why is that so?
26%
Flag icon
A statistical model of your judgments cannot possibly add anything to the information they contain. All the model can do is subtract and simplify.
26%
Flag icon
Failing to reproduce your subtle rules will result in a loss of accuracy when your subtlety is valid.
26%
Flag icon
complex rules will often give you only the illusion of validity and in fact harm the quality of your judgments. Some subtleties are valid, but many are not.
26%
Flag icon
The effect of removing noise from your judgments will always be an improvement of your predictive accuracy.
26%
Flag icon
replacing you with a model of you does two things: it eliminates your subtlety, and it eliminates your pattern noise. The robust finding that the model of the judge is more valid than the judge conveys an important message: the gains from subtle rules in human judgment—when they exist—are generally not sufficient to compensate for the detrimental effects of noise.
26%
Flag icon
Why do complex rules of prediction harm accuracy, despite the strong feeling we have that they draw on valid insights? For one thing, many of the complex rules that people invent are not likely to be generally true. But there is another problem: even when the complex rules are valid in principle, they inevitably apply under conditions that are rarely observed.
26%
Flag icon
to put it bluntly, it proved almost impossible in that study to generate a simple model that did worse than the experts did.
26%
Flag icon
Of course, we should not conclude that any model beats any human.
26%
Flag icon
“mindless consistency”)
26%
Flag icon
This quick tour has shown how noise impairs clinical judgment. In predictive judgments, human experts are easily outperformed by simple formulas—models of reality, models of a judge, or even randomly generated models. This finding argues in favor of using noise-free methods:
26%
Flag icon
“People believe they capture complexity and add subtlety when they make judgments. But the complexity and the subtlety are mostly wasted—usually they do not add to the accuracy of simple models.”
27%
Flag icon
all mechanical approaches are noise-free.
27%
Flag icon
improper linear model.
27%
Flag icon
His surprising discovery was that these equal-weight models are about as accurate as “proper” regression models, and far superior to clinical judgments.
27%
Flag icon
multiple regression computes “optimal” weights that minimize squared errors. But multiple regression minimizes error in the original data. The formula therefore adjusts itself to predict every random fluke in the data.
27%
Flag icon
model’s predictive accuracy is its performance in a new sample, called its cross-validated correlation.
27%
Flag icon
The loss of accuracy in cross-validation is worst when the original sample is small, because flukes loom larger in small samples.
27%
Flag icon
As statistician Howard Wainer memorably put it in the subtitle of a scholarly article on the estimation of proper weights, “It Don’t Make No Nevermind.” Or, in Dawes’s words, “we do not need models more precise than our measurements.”
1 5 13