Noise: A Flaw in Human Judgment
Rate it:
Open Preview
Read between December 12 - December 14, 2021
15%
Flag icon
Speaking of the Error Equation “Oddly, reducing bias and noise by the same amount has the same effect on accuracy.” “Reducing noise in predictive judgment is always useful, regardless of what you know about bias.” “When judgments are split 84 to 16 between those that are above and below the true value, there is a large bias—that’s when bias and noise are equal.” “Predictive judgments are involved in every decision, and accuracy should be their only goal. Keep your values and your facts separate.”
15%
Flag icon
To illustrate the analysis of noise with multiple cases, we turn to an exceptionally detailed noise audit of sentencing by federal judges. The analysis was published in 1981 as part of the movement toward sentencing reform that we described in chapter 1. The study narrowly focused on sentencing decisions, but the lessons it offers are general and bear on other professional judgments. The goal of the noise audit was to go beyond the vivid but anecdotal evidence of noise assembled by Judge Frankel and others and to “determine the extent of sentencing disparity” more systematically.
15%
Flag icon
Unfortunately, the world of federal justice is not perfect. The judges are not identical, and variability within columns is large, indicating noise in the judgments of each case. There is more variability in sentences than there should be, and the study’s aim is to analyze it.
16%
Flag icon
Variability in level errors will be found in any judgment task.
16%
Flag icon
The general conclusion is that the average level of sentencing functions like a personality trait. You could use this study to arrange judges on a scale that ranges from very harsh to very lenient, just as a personality test might measure their degree of extraversion or agreeableness.
16%
Flag icon
We use the term pattern noise for the variability we just identified, because that variability reflects a complex pattern in the attitudes of judges to particular cases. One judge, for instance, may be harsher than average in general but relatively more lenient toward white-collar criminals. Another may be inclined to punish lightly but more severely when the offender is a recidivist. A third may be close to the average severity but sympathetic when the offender is merely an accomplice and tough when the victim is an older person. (We use the term pattern noise in the interest of readability. ...more
16%
Flag icon
By the same token, the judges would not have set precisely the same sentences to the sixteen cases if they had been asked to judge them again on another occasion.
17%
Flag icon
Our name for the variability that is due to transient effects is occasion noise.
17%
Flag icon
To summarize, we discussed several types of noise. System noise is undesirable variability in the judgments of the same case by multiple individuals. We have identified its two major components, which can be separated when the same individuals evaluate multiple cases: Level noise is variability in the average level of judgments by different judges. Pattern noise is variability in judges’ responses to particular cases. In the present study, the amounts of level noise and pattern noise were approximately equal. However, the component that we identified as pattern noise certainly contains some ...more
17%
Flag icon
Speaking of Analyzing Noise “Level noise is when judges show different levels of severity. Pattern noise is when they disagree with one another on which defendants deserve more severe or more lenient treatment. And part of pattern noise is occasion noise—when judges disagree with themselves.” “In a perfect world, defendants would face justice; in our world, they face a noisy system.”
17%
Flag icon
We have described the process that picks an underwriter, a judge, or a doctor as a lottery that creates system noise. Occasion noise is the product of a second lottery. This lottery picks the moment when the professional makes a judgment, the professional’s mood, the sequence of cases that are fresh in mind, and countless other features of the occasion. This second lottery usually remains much more abstract than the first.
17%
Flag icon
Occasion noise is the variability among these unseen possibilities.
17%
Flag icon
Measuring occasion noise is not easy—for much the same reason that its existence, once established, often surprises us. When people form a carefully considered professional opinion, they associate it with the reasons that justify their point of view. If pressed to explain their judgment, they will usually defend it with arguments that they find convincing. And if they are presented with the same problem a second time and recognize it, they will reproduce the earlier answer both to minimize effort and maintain consistency. Consider this example from the teaching profession: if a teacher gives ...more
18%
Flag icon
Realistically speaking, there is no hope of discovering all the extraneous sources of occasion noise, but those that can be found illustrate the great variety of these sources. If we are to control occasion noise, we must try to understand the mechanisms that produce it.
18%
Flag icon
Vul and Pashler wanted to find out if the same effect extends to occasion noise: can you get closer to the truth by combining two guesses from the same person, just as you do when you combine the guesses of different people? As they discovered, the answer is yes. Vul and Pashler gave this finding an evocative name: the crowd within.
18%
Flag icon
First, assume that your first estimate is off the mark. Second, think about a few reasons why that could be. Which assumptions and considerations could have been wrong? Third, what do these new considerations imply? Was the first estimate rather too high or too low? Fourth, based on this new perspective, make a second, alternative estimate. Like Vul and Pashler, Herzog and Hertwig then averaged the two estimates thus produced. Their technique, which they named dialectical bootstrapping, produced larger improvements in accuracy than did a simple request for a second estimate immediately ...more
18%
Flag icon
The upshot for decision makers, as summarized by Herzog and Hertwig, is a simple choice between procedures: if you can get independent opinions from others, do it—this real wisdom of crowds is highly likely to improve your judgment. If you cannot, make the same judgment yourself a second time to create an “inner crowd.”
18%
Flag icon
problems: occasion noise affects all our judgments, all the time.
19%
Flag icon
In other words, mood has a measurable influence on what you think: what you notice in your environment, what you retrieve from your memory, how you make sense of these signals. But mood has another, more surprising effect: it also changes how you think. And here, the effects are not those you might imagine. Being in a good mood is a mixed blessing, and bad moods have a silver lining. The costs and benefits of different moods are situation-specific.
19%
Flag icon
People who are in a good mood are more likely to let their biases affect their thinking.
19%
Flag icon
(Bullshit has become something of a technical term since Harry Frankfurt, a philosopher at Princeton University, published an insightful book, On Bullshit, in which he distinguished bullshit from other types of misrepresentation.)
19%
Flag icon
Inducing good moods makes people more receptive to bullshit and more gullible in general; they are less apt to detect deception or identify misleading information.
19%
Flag icon
Conversely, eyewitnesses who are exposed to misleading information are better able to disregard it—and to avoid false testimony—when they are in a bad mood.
19%
Flag icon
We have described these studies of mood in some detail because we need to emphasize an important truth: you are not the same person at all times. As your mood varies (something you are, of course, aware of), some features of your cognitive machinery vary with it (something you are not fully aware of). If you are shown a complex judgment problem, your mood in the moment may influence your approach to the problem and the conclusions you reach, even when you believe that your mood has no such influence and even when you can confidently justify the answer you found. In short, you are noisy.
19%
Flag icon
granted. This behavior reflects a cognitive bias known as the gambler’s fallacy: we tend to underestimate the likelihood that streaks will occur by chance.
20%
Flag icon
These findings suggest that memory performance is driven in large part by, in Kahana and coauthors’ words, “the efficiency of endogenous neural processes that govern memory function.” In other words, the moment-to-moment variability in the efficacy of the brain is not just driven by external influences, like the weather or a distracting intervention. It is a characteristic of the way our brain itself functions.
20%
Flag icon
It is very likely that intrinsic variability in the functioning of the brain also affects the quality of our judgments in ways that we cannot possibly hope to control. This variability in brain function should give pause to anyone who thinks occasion noise can be eliminated.
20%
Flag icon
Speaking of Occasion Noise “Judgment is like a free throw: however hard we try to repeat it precisely, it is never exactly identical.” “Your judgment depends on what mood you are in, what cases you have just discussed, and even what the weather is. You are not the same person at all times.” “Although you may not be the same person you were last week, you are less different from the ‘you’ of last week than you are from someone else today. Occasion noise is not the largest source of system noise.”
20%
Flag icon
There are “wise crowds,” whose mean judgment is close to the correct answer, but there are also crowds that follow tyrants, that fuel market bubbles, that believe in magic, or that are under the sway of a shared illusion. Minor differences can lead one group toward a firm yes and an essentially identical group toward an emphatic no. And because of the dynamics among group members—our emphasis here—the level of noise can be high. That proposition holds whether we are speaking of noise across similar groups or of a single group whose firm judgment on an important matter should be seen as merely ...more
21%
Flag icon
condition.” In short, social influences create significant noise across groups.
21%
Flag icon
Within very large groups, popularity and unpopularity bred more of the same, even when the researchers misled people about which songs were popular. The single exception is that the very most popular song in the control group did rise in popularity over time, which means that the inverted ranking could not keep the best song down. For the most part, however, the inverted ranking helped determine the ultimate ranking.
21%
Flag icon
Remarkably, this effect persisted over time. After five months, a single positive initial vote artificially increased the mean rating of comments by 25%. The effect of a single positive early vote is a recipe for noise. Whatever the reason for that vote, it can produce a large-scale shift in overall popularity.
21%
Flag icon
This study offers a clue about how groups shift and why they are noisy (again in the sense that similar groups can make very different judgments, and single groups can make judgments that are merely one in a cloud of possibilities). Members are often in a position to offer the functional equivalent of an early up vote (or down vote) by indicating agreement, neutrality, or dissent. If a group member has given immediate approval, other members have reason to do so as well. There is no question that when groups move in the direction of some products, people, movements, and ideas, it may not be ...more
21%
Flag icon
But independence is a prerequisite for the wisdom of crowds. If people are not making their own judgments and are relying instead on what other people think, crowds might not be so wise after all.
21%
Flag icon
The irony is that while multiple independent opinions, properly aggregated, can be strikingly accurate, even a little social influence can produce a kind of herding that undermines the wisdom of crowds.
22%
Flag icon
This example, of course, is highly artificial. But within groups of all kinds, something like it happens all the time. People learn from others, and if early speakers seem to like something or want to do something, others might assent. At least this is so if they do not have reason to distrust them and if they lack a good reason to think that they are wrong.
Wally Bock
See The Abilene Paradox by Jerry Harvey
22%
Flag icon
For our purposes, the most important point is that informational cascades make noise across groups possible and even likely.
22%
Flag icon
However, the study of juries uncovers a distinct kind of social influence that is also a source of noise: group polarization. The basic idea is that when people speak with one another, they often end up at a more extreme point in line with their original inclinations.
23%
Flag icon
Recall the basic finding of group polarization: after people talk with one another, they typically end up at a more extreme point in line with their original inclinations.
23%
Flag icon
Since many of the most important decisions in business and government are made after some sort of deliberative process, it is especially important to be alert to this risk. Organizations and their leaders should take steps to control noise in the judgments of their individual members. They should also manage deliberating groups in a way that is likely to reduce noise, not amplify it. The noise-reduction strategies we will propose aim to achieve that goal.
23%
Flag icon
Speaking of Group Decisions “Everything seems to depend on early popularity. We’d better work hard to make sure that our new release has a terrific first week.” “As I always suspected, ideas about politics and economics are a lot like movie stars. If people think that other people like them, such ideas can go far.” “I’ve always been worried that when my team gets together, we end up confident and unified—and firmly committed to the course of action that we choose. I guess there’s something in our internal processes that isn’t going all that well!”
23%
Flag icon
A measure that captures this intuition is the percent concordant (PC),
23%
Flag icon
PC is an immediately intuitive measure of covariation, which is a large advantage, but it is not the standard measure that social scientists use. The standard measure is the correlation coefficient (r), which varies between 0 and 1 when two variables are positively related. In the preceding example, the correlation between height and foot size is about .60.
23%
Flag icon
The two measures of covariation we have described are directly related to each other. Table 1 presents the PC for various values of the correlation coefficient. In the rest of this book, we always present the two measures together when we discuss the performance of humans and models.
24%
Flag icon
The informal approach you took to this problem is known as clinical judgment. You consider the information, perhaps engage in a quick computation, consult your intuition, and come up with a judgment. In fact, clinical judgment is the process that we have described simply as judgment in this book.
24%
Flag icon
The use of multiple regression is an example of mechanical prediction. There are many kinds of mechanical prediction, ranging from simple rules (“hire anyone who completed high school”) to sophisticated artificial intelligence models. But linear regression models are the most common (they have been called “the workhorse of judgment and decision-making research”). To minimize jargon, we will refer to linear models as simple models.
24%
Flag icon
The question had been asked before, but it attracted much attention only in 1954, when Paul Meehl, a professor of psychology at the University of Minnesota, published a book titled Clinical Versus Statistical Prediction: A Theoretical Analysis and a Review of the Evidence. Meehl reviewed twenty studies in which a clinical judgment was pitted against a mechanical prediction for such outcomes as academic success and psychiatric prognosis. He reached the strong conclusion that simple mechanical rules were generally superior to human judgment. Meehl discovered that clinicians and other ...more
25%
Flag icon
description. A 2000 review of 136 studies confirmed unambiguously that mechanical aggregation outperforms clinical judgment. The research surveyed in the article covered a wide variety of topics, including diagnosis of jaundice, fitness for military service, and marital satisfaction. Mechanical prediction was more accurate in 63 of the studies, a statistical tie was declared for another 65, and clinical prediction won the contest in 8 cases. These results understate the advantages of mechanical prediction, which is also faster and cheaper than clinical judgment. Moreover, human judges actually ...more
26%
Flag icon
in short, when we make judgments that are not reducible to a plain operation of weighted averaging. The model-of-the-judge studies reinforce Meehl’s conclusion that the subtlety is largely wasted. Complexity and richness do not generally lead to more accurate predictions.
26%
Flag icon
In short, replacing you with a model of you does two things: it eliminates your subtlety, and it eliminates your pattern noise. The robust finding that the model of the judge is more valid than the judge conveys an important message: the gains from subtle rules in human judgment—when they exist—are generally not sufficient to compensate for the detrimental effects of noise. You may believe that you are subtler, more insightful, and more nuanced than the linear caricature of your thinking. But in fact, you are mostly noisier.