Noise: A Flaw in Human Judgment
Rate it:
Read between December 12 - December 19, 2021
58%
Flag icon
The skills they bring to bear reflect the sort of cognitive style we described in chapter 18 as likely to result in better judgments, particularly a high level of “active open-mindedness.”
58%
Flag icon
Tetlock finds that “the strongest predictor of rising into the ranks of superforecasters is perpetual beta, the degree to which one is committed to belief updating and self-improvement.” As he puts it, “What makes them so good is less what they are than what they do—the hard work of research, the careful thought and self-criticism, the gathering and synthesizing of other perspectives, the granular judgments and relentless updating.” They like a particular cycle of thinking: “try, fail, analyze, adjust, try again.”
63%
Flag icon
A theoretically effective solution to the problem of ratings inflation is to introduce some standardization in ratings. One popular practice that aims to do this is forced ranking. In a forced ranking system, raters are not only prevented from giving everyone the highest possible rating but also forced to abide by a predetermined distribution. Forced ranking was advocated by Jack Welch when he was CEO of General Electric, as a way to stop inflation in ratings and to ensure “candor” in performance reviews. Many companies adopted it, only to abandon it later, citing undesirable side effects on ...more
64%
Flag icon
The upshot is that a system that depends on relative evaluations is appropriate only if an organization cares about relative performance. For example, relative ratings might make sense when, regardless of people’s absolute performance, only a fixed percentage of them can be promoted—think of colonels being evaluated for promotion to general. But forcing a relative ranking on what purports to measure an absolute level of performance, as many companies do, is illogical. And mandating that a set percentage of employees be rated as failing to meet (absolute) expectations is not just cruel; it is ...more
64%
Flag icon
The second problem is that the forced distribution of the ratings is assumed to reflect the distribution of the underlying true performances—typically, something close to a normal distribution. Yet even if the distribution of performances in the population being rated is known, the same distribution may not be reproduced in a smaller group, such as those assessed by a single evaluator. If you randomly pick ten people from a population of several thousand, there is no guarantee that exactly two of them will belong to the top 20% of the general population.
64%
Flag icon
Critics of forced ranking have often focused their attacks on the principle of ranking, which they decry as brutal, inhumane, and ultimately counterproductive. Whether or not you accept these arguments, the fatal flaw of forced ranking is not the “ranking,” but the “forced.” Whenever judgments are forced onto an inappropriate scale, either because a relative scale is used to measure an absolute performance or because judges are forced to distinguish the indistinguishable, the choice of the scale mechanically adds noise.
65%
Flag icon
Evidence suggests, however, that behaviorally anchored rating scales are not sufficient to eliminate noise. A further step, frame-of-reference training, has been shown to help ensure consistency between raters. In this step, raters are trained to recognize different dimensions of performance. They practice rating performance using videotaped vignettes and then learn how their ratings compare with “true” ratings provided by experts.
65%
Flag icon
Frame-of-reference training has been known for decades and provides demonstrably less noisy and more accurate ratings. Yet it has gained little ground. It is easy to guess why. Frame-of-reference training, case scales, and other tools that pursue the same goals are complex and time-consuming. To be valuable, they usually need to be customized for the company and even for the unit conducting the evaluations, and they must be frequently updated as job requirements evolve. These tools require a company to add to its already-large investment in its performance management systems.
65%
Flag icon
In essence, if your goal is to determine which candidates will succeed in a job and which will fail, standard interviews (also called unstructured interviews to distinguish them from structured interviews, to which we will turn shortly) are not very informative. To put it more starkly, they are often useless.
66%
Flag icon
Interviews are also a minefield of psychological biases. In recent years, people have become well aware that interviewers tend, often unintentionally, to favor candidates who are culturally similar to them or with whom they have something in common, including gender, race, and educational background. Many companies now recognize the risks posed by biases and try to address them through specific training of recruiting professionals and other employees. Other biases have also been known for decades. For instance, physical appearance plays a large part in the evaluation of candidates, even for ...more
66%
Flag icon
A more surprising finding is the presence of much occasion noise in interviews. There is strong evidence, for instance, that hiring recommendations are linked to impressions formed in the informal rapport-building phase of an interview, those first two or three minutes where you just chat amicably to put the candidate at ease. First impressions turn out to matter—a lot.
67%
Flag icon
One of these strategies should be familiar by now: aggregation. Its use in this context is not a surprise. Almost all companies aggregate the judgments of multiple interviewers on the same candidate.
67%
Flag icon
To ensure this level of validity, however, Google stringently enforces a rule that not all companies observe: the company makes sure that the interviewers rate the candidate separately, before they communicate with one another. Once more: aggregation works—but only if the judgments are independent.
69%
Flag icon
However, Jeff added, the analysts should include in each chapter all the relevant factual information about the assessment. “Don’t hide anything,” he instructed them. “The general tone of the chapter will be consistent with the proposed rating, of course, but if there is information that seems inconsistent or even contradictory with the main rating, don’t sweep anything under the rug. Your job is not to sell your recommendation. It is to represent the truth. If it is complicated, so be it—it often is.” In the same spirit, Jeff encouraged the analysts to be transparent about their level of ...more
69%
Flag icon
Joan immediately noticed something important: while most of the assessments supported doing the deal, they did not paint a simple, rosy, all-systems-go picture. Some of the ratings were strong; others were not. These differences, she knew, were a predictable result of keeping the assessments independent of one another. When excessive coherence is kept in check, reality is not as coherent as most board presentations make it seem. “Good,” Joan thought. “These discrepancies between assessments will raise questions and trigger discussions.
69%
Flag icon
Joan convened a meeting of the board to review the report and come to a decision. She explained the approach that the deal team followed, and she invited the board members to apply the same principle. “Jeff and his team have worked hard to keep the assessments independent of each other,” she said, “and our task now is to review them independently, too. This means we will consider each assessment separately, before we start discussing the final decision. We are going to treat each assessment as a distinct agenda item.”
70%
Flag icon
1. At the beginning of the process, structure the decision into mediating assessments. (For recurring judgments, this is done only once.) 2. Ensure that whenever possible, mediating assessments use an outside view. (For recurring judgments: use relative judgments, with a case scale if possible.) 3. In the analytical phase, keep the assessments as independent of one another as possible. 4. In the decision meeting, review each assessment separately. 5. On each assessment, ensure that participants make their judgments individually; then use the estimate-talk-estimate method. 6. To make the final ...more
71%
Flag icon
We have defined noise as unwanted variability, and if something is unwanted, it should probably be eliminated. But the analysis is more complicated and more interesting than that. Noise may be unwanted, other things being equal. But other things might not be equal, and the costs of eliminating noise might exceed the benefits. And even when an analysis of costs and benefits suggests that noise is costly, eliminating it might produce a range of awful or even unacceptable consequences for both public and private institutions. There are seven major objections to efforts to reduce or eliminate ...more
71%
Flag icon
Second, some strategies introduced to reduce noise might introduce errors of their own. Occasionally, they might produce systematic bias.
71%
Flag icon
Third, if we want people to feel that they have been treated with respect and dignity, we might have to tolerate some noise. Noise can be a by-product of an imperfect process that people end up embracing because the process gives everyone (employees, customers, applicants, students, those accused of crime) an individualized hearing, an opportunity to influence the exercise of discretion, and a sense that they have had a chance to be seen and heard. Fourth, noise might be essential to accommodate new values and hence to allow moral and political evolution.
71%
Flag icon
Fifth, some strategies designed to reduce noise might encourage opportunistic behavior, allowing people to game the system or evade prohibitions.
71%
Flag icon
Sixth, a noisy process might be a good deterrent. If people know that they could be subject to either a small penalty or a large one, they might steer clear of wrongdoing, at least if they are risk-averse. A system might tolerate noise as a way of producing extra deterrence. Finally, people do not want to be treated as if they are mere things, or cogs in some kind of machine. Some noise-reduction strategies might squelch people’s creativity and prove demoralizing.
74%
Flag icon
Because it is not bound by rules, mercy is noisy.
74%
Flag icon
We have emphasized that in situations like hiring, admissions, and medicine, some noise-reduction strategies might turn out to be crude; they might forbid forms of individualized treatment that, while noisy, would produce fewer errors on balance. But if a noise-reduction strategy is crude, then, as we have urged, the best response is to try to come up with a better strategy—one attuned to a wide range of relevant variables. And if that better strategy eliminates noise and produces fewer errors, it would have obvious advantages over individualized treatment, even if it reduces or eliminates the ...more
75%
Flag icon
A rule-bound system might eliminate noise, which is good, but it might also freeze existing norms and values, which is not so good. In sum, some people might insist that an advantage of a noisy system is that it will allow people to accommodate new and emerging values. As values change, and if judges are allowed to exercise discretion, they might begin to give, for example, lower sentences to those convicted of drug offenses or higher sentences to those convicted of rape. We have emphasized that if some judges are lenient and others are not, then there will be a degree of unfairness; similarly ...more
76%
Flag icon
In short, a noisy system might be good for morale not because it is noisy but because it allows people to decide as they see fit. If employees are allowed to respond to customer complaints in their own way, evaluate their subordinates as they think best, or establish premiums as they deem appropriate, then they might enjoy their jobs more. If the company takes steps to eliminate noise, employees might think that their own agency has been compromised. Now they are following rules rather than exercising their own creativity. Their jobs look more mechanical, even robotic. Who wants to work in a ...more
79%
Flag icon
We say that bias exists when most errors in a set of judgments are in the same direction.
79%
Flag icon
Eliminating bias from a set of judgments will not eliminate all error. The errors that remain when bias is removed are not shared. They are the unwanted divergence of judgments, the unreliability of the measuring instrument we apply to reality. They are noise. Noise is variability in judgments that should be identical. We use the term system noise for the noise observed in organizations that employ interchangeable professionals to make decisions, such as physicians in an emergency room, judges imposing criminal penalties, and underwriters in an insurance company. Much of this book has been ...more
81%
Flag icon
The goal of judgment is accuracy, not individual expression. This statement is our candidate for the first principle of decision hygiene in judgment. It reflects the narrow, specific way we have defined judgment in this book. We have shown that stable pattern noise is a large component of system noise and that it is a direct consequence of individual differences, of judgment personalities that lead different people to form different views of the same problem. This observation leads to a conclusion that will be as unpopular as it is inescapable: judgment is not the place to express your ...more
81%
Flag icon
Resist premature intuitions. We have described the internal signal of judgment completion that gives decision makers confidence in their judgment. The unwillingness of decision makers to give up this rewarding signal is a key reason for the resistance to the use of guidelines and algorithms and other rules that tie their hands. Decision makers clearly need to be comfortable with their eventual choice and to attain the rewarding sense of intuitive confidence. But they should not grant themselves this reward prematurely. An intuitive choice that is informed by a balanced and careful ...more
1 3 Next »