More on this book
Community
Kindle Notes & Highlights
Bias and noise—systematic deviation and random scatter—are different components of error.
Some judgments are biased; they are systematically off target. Other judgments are noisy, as people who are expected to agree end up at very different points around the target. Many organizations, unfortunately, are afflicted by both bias and noise.
general property of noise is that you can recognize and measure it while knowing nothing about the target or bias.
To understand error in judgment, we must understand both bias and noise. Sometimes, as we will see, noise is the more important problem. But in public conversations about human error and in organizations all over the world, noise is rarely recognized. Bias is the star of the show. Noise is a bit player, usually offstage. The topic of bias has been discussed in thousands of scientific articles and dozens of popular books, few of which even mention the issue of noise. This book is our attempt to redress the balance. In real-world decisions, the amount of noise is often scandalously high.
Wherever you look at human judgments, you are likely to find noise. To improve the quality of our judgments, we need to overcome noise as well as bias.
Occasion noise is the variability in judgments of the same case by the same person or group on different occasions. A surprising amount of occasion noise arises in group discussion because of seemingly irrelevant factors, such as who speaks first.
We explore the key advantage of rules, formulas, and algorithms over humans when it comes to making predictions: contrary to popular belief, it is not so much the superior insight of rules but their noiselessness.
We explain the central causes of noise. These include interpersonal differences arising from a variety of factors, including personality and cognitive style; idiosyncratic variations in the weighting of different considerations; and the different uses that people make of the very same scales. We explore why people are oblivious to noise and are frequently unsurprised by events and judgments they could not possibly have predicted.
We introduce several noise-reduction techniques that we collect under the label of decision hygiene.
The theme that emerges from these three chapters can be summarized in one sentence, which will be a key theme of this book: wherever there is judgment, there is noise—and more of it than you think.
there is also ample evidence that irrelevant information, in the form of small and seemingly random factors, can produce major differences in outcomes.
Speaking of Noise in Sentencing “Experiments show large disparities among judges in the sentences they recommend for identical cases. This variability cannot be fair. A defendant’s sentence should not depend on which judge the case happens to be assigned to.” “Criminal sentences should not depend on the judge’s mood during the hearing, or on the outside temperature.” “Guidelines are one way to address this issue. But many people don’t like them, because they limit judicial discretion, which might be necessary to ensure fairness and accuracy. After all, each case is unique, isn’t it?”
Many professionals in any large company are authorized to make judgments that bind the company.
For any risk, there is a Goldilocks price that is just right—neither too high nor too low—and there is a good chance that the average judgment of a large group of professionals is not too far from this Goldilocks number. Prices that are higher or lower than this number are costly—this is how the variability of noisy judgments hurts the bottom line.
We use the word lottery to emphasize the role of chance in the selection of one underwriter or adjuster.
Lotteries have their place, and they need not be unjust. Acceptable lotteries are used to allocate “goods,” like courses in some universities, or “bads,” like the draft in the military. They serve a purpose. But the judgment lotteries we talk about allocate nothing. They just produce uncertainty.
A defining feature of system noise is that it is unwanted, and we should stress right here that variability in judgments is not always unwanted.
Matters of taste and competitive settings all pose interesting problems of judgment. But our focus is on judgments in which variability is undesirable. System noise is a problem of systems, which are organizations, not markets. When traders make different assessments of the value of a stock, some of them will make money, and others will not. Disagreements make markets. But if one of those traders is randomly chosen to make that assessment on behalf of her firm, and if we find out that her colleagues in the same firm would produce very different assessments, then the firm faces system noise,
...more
If one insurance policy is overpriced and another is underpriced, pricing may on average look right, but the insurance company has made two costly errors. If two felons who both should be sentenced to five years in prison receive sentences of three years and seven years, justice has not, on average, been done. In noisy systems, errors do not cancel out. They add up.
We came to see that the problem of system noise often goes unrecognized in organizations and that the common inattention to noise is as interesting as its prevalence.
Confidence is nurtured by the subjective experience of judgments that are made with increasing fluency and ease, in part because they resemble judgments made in similar cases in the past. Over time, as this underwriter learned to agree with her past self, her confidence in her judgments increased.
How had the leaders of the company remained unaware of their noise problem? There are several possible answers here, but one that seems to play a large role in many settings is simply the discomfort of disagreement. Most organizations prefer consensus and harmony over dissent and conflict. The procedures in place often seem expressly designed to minimize the frequency of exposure to actual disagreements and, when such disagreements happen, to explain them away.
Consider another mechanism that many companies resort to: postmortems of unfortunate judgments. As a learning mechanism, postmortems are useful. But if a mistake has truly been made—in the sense that a judgment strayed far from professional norms—discussing it will not be challenging. Experts will easily conclude that the judgment was way off the consensus. (They might also write it off as a rare exception.) Bad judgment is much easier to identify than good judgment. The calling out of egregious mistakes and the marginalization of bad colleagues will not help professionals become aware of how
...more
If singular decisions are just as noisy as recurrent ones, then the strategies that reduce noise in recurrent decisions should also improve the quality of singular decisions.
From the perspective of noise reduction, a singular decision is a recurrent decision that happens only once. Whether you make a decision only once or a hundred times, your goal should be to make it in a way that reduces both bias and noise. And practices that reduce error should be just as effective in your one-of-a-kind decisions as in your repeated ones.
“Remember: a singular decision is a recurrent decision that is made only once.”
Judgment can therefore be described as measurement in which the instrument is a human mind. Implicit in the notion of measurement is the goal of accuracy—to approach truth and minimize error. The goal of judgment is not to impress, not to take a stand, not to persuade. It is important to note that the concept of judgment as we use it here is borrowed from the technical psychological literature, and that it is a much narrower concept than the same word has in everyday language. Judgment is not a synonym for thinking, and making accurate judgments is not a synonym for having good judgment.
Indeed, the word judgment is used mainly where people believe they should agree. Matters of judgment differ from matters of opinion or taste, in which unresolved differences are entirely acceptable.
evaluating the process of judgment. When we speak of good or bad judgments, we may be speaking either about the output (e.g., the number you produced in the Gambardi case) or about the process—what you did to arrive at that number. One approach to the evaluation of the process of judgment is to observe how that process performs when it is applied to a large number of cases.
We have contrasted two ways of evaluating a judgment: by comparing it to an outcome and by assessing the quality of the process that led to it. Note that when the judgment is verifiable, the two ways of evaluating it may reach different conclusions in a single case.
Scholars of decision-making offer clear advice to resolve this tension: focus on the process, not on the outcome of a single case. We recognize, however, that this is not standard practice in real life. Professionals are usually evaluated on how closely their judgments match verifiable outcomes, and if you ask them what they aim for in their judgments, a close match is what they will answer.
In summary, what people usually claim to strive for in verifiable judgments is a prediction that matches the outcome. What they are effectively trying to achieve, regardless of verifiability, is the internal signal of completion provided by the coherence between the facts of the case and the judgment. And what they should be trying to achieve, normatively speaking, is the judgment process that would produce the best judgment over an ensemble of similar cases.
Your intuition probably favors the mean, and your intuition is correct. The mean contains more information; it is affected by the size of the numbers, while the median is affected only by their order. There is a tight link between this problem of estimation, about which you have a clear intuition, and the problem of overall error measurement that concerns us here. They are, in fact, two sides of the same coin. That is because the best estimate is one that minimizes the overall error of the available measurements. Accordingly, if your intuition about the mean being the best estimate is correct,
...more
A widely accepted maxim of good decision making is that you should not mix your values and your facts. Good decision making must be based on objective and accurate predictive judgments that are completely unaffected by hopes and fears, or by preferences and values.
In all these examples, the final decisions require evaluative judgments. The decision makers must consider multiple options and apply their values to make the optimal choice. But the decisions depend on underlying predictions, which should be value-neutral. Their goal is accuracy—hitting as close as possible to the bull’s-eye—and MSE is the appropriate measure of error.
Variability in level errors will be found in any judgment task. Examples are evaluations of performance where some supervisors are more generous than others, predictions of market share where some forecasters are more optimistic than others, or recommendations for back surgery where some orthopedists are more aggressive than others.
We use the term pattern noise for the variability we just identified, because that variability reflects a complex pattern in the attitudes of judges to particular cases. One judge, for instance, may be harsher than average in general but relatively more lenient toward white-collar criminals. Another may be inclined to punish lightly but more severely when the offender is a recidivist. A third may be close to the average severity but sympathetic when the offender is merely an accomplice and tough when the victim is an older person. (We use the term pattern noise in the interest of readability.
...more
System noise is undesirable variability in the judgments of the same case by multiple individuals. We have identified its two major components, which can be separated when the same individuals evaluate multiple cases: Level noise is variability in the average level of judgments by different judges. Pattern noise is variability in judges’ responses to particular cases.
Measuring occasion noise is not easy—for much the same reason that its existence, once established, often surprises us. When people form a carefully considered professional opinion, they associate it with the reasons that justify their point of view. If pressed to explain their judgment, they will usually defend it with arguments that they find convincing. And if they are presented with the same problem a second time and recognize it, they will reproduce the earlier answer both to minimize effort and maintain consistency.
For this reason, direct measurements of occasion noise are hard to obtain whenever cases are easily memorable.
Averaging two guesses by the same person does not improve judgments as much as does seeking out an independent second opinion. As Vul and Pashler put it, “You can gain about 1/10th as much from asking yourself the same question twice as you can from getting a second opinion from someone else.” This is not a large improvement. But you can make the effect much larger by waiting to make a second guess.
There is at least one source of occasion noise that we have all noticed: mood.
We have described these studies of mood in some detail because we need to emphasize an important truth: you are not the same person at all times. As your mood varies (something you are, of course, aware of), some features of your cognitive machinery vary with it (something you are not fully aware of). If you are shown a complex judgment problem, your mood in the moment may influence your approach to the problem and the conclusions you reach, even when you believe that your mood has no such influence and even when you can confidently justify the answer you found. In short, you are noisy.
Among the extraneous factors that should not influence professional judgments, but do, are two prime suspects: stress and fatigue.
Another source of random variability in judgment is the order in which cases are examined. When a person is considering a case, the decisions that immediately preceded it serve as an implicit frame of reference.
Professionals who make a series of decisions in sequence, including judges, loan officers, and baseball umpires, lean toward restoring a form of balance: after a streak, or a series of decisions that go in the same direction, they are more likely to decide in the opposite direction than would be strictly justified.
These findings suggest that memory performance is driven in large part by, in Kahana and coauthors’ words, “the efficiency of endogenous neural processes that govern memory function.” In other words, the moment-to-moment variability in the efficacy of the brain is not just driven by external influences, like the weather or a distracting intervention. It is a characteristic of the way our brain itself functions. It is very likely that intrinsic variability in the functioning of the brain also affects the quality of our judgments in ways that we cannot possibly hope to control. This variability
...more
They were testing for a particular driver of noise: social influence. The key finding was that group rankings were wildly disparate: across different groups, there was a great deal of noise.
Aggregating judgments can be an excellent way of reducing noise, and therefore error. But what happens if people are listening to one another? You might well think that their doing so is likely to help. After all, people can learn from one another and thus figure out what is right. Under favorable circumstances, in which people share what they know, deliberating groups can indeed do well. But independence is a prerequisite for the wisdom of crowds. If people are not making their own judgments and are relying instead on what other people think, crowds might not be so wise after all.
Some of the studies we are describing involve informational cascades. Such cascades are pervasive. They help explain why similar groups in business, government, and elsewhere can go in multiple directions and why small changes can produce such different outcomes and hence noise. We are able to see history only as it was actually run, but for many groups and group decisions, there are clouds of possibilities, only one of which is realized.