More on this book
Community
Kindle Notes & Highlights
Our topic is human error. Bias and noise—systematic deviation and random scatter—are different components of error.
A general property of noise is that you can recognize and measure it while knowing nothing about the target or bias.
We don’t need to know who is right to measure how much the judgments of the same case vary. All we have to do to measure noise is look at the back of the target.
To understand error in judgment, we must understand both bias and noise. Sometimes, as we will see, noise is the more important problem. But in public conversations about human error and in organizations all over the world, noise is rarely recognized. Bias is the star of the show. Noise is a bit player, usually offstage. The topic of bias has been discussed in thousands of scientific articles and dozens of popular books, few of which even mention the issue of noise. This book is our attempt to redress the balance.
This book comes in six parts.
In part 1, we explore the difference between noise and bias, and we show that both public and private organizations can be noisy, sometimes shockingly so.
In part 2, we investigate the nature of human judgment and explore how to measure accuracy and error.
Part 3 takes a deeper look at one type of judgment that has been researched extensively: predictive judgment.
Part 4 turns to human psychology. We explain the central causes of noise.
Part 5 explores the practical question of how you can improve your judgments and prevent error.
What is the right level of noise? Part 6 turns to this question.
The theme that emerges from these three chapters can be summarized in one sentence, which will be a key theme of this book: wherever there is judgment, there is noise—and more of it than you think.
As revealing as they are, these studies, which involve tightly controlled experiments, almost certainly understate the magnitude of noise in the real world of criminal justice. Real-life judges are exposed to far more information than what the study participants received in the carefully specified vignettes of these experiments. Some of this additional information is relevant, of course, but there is also ample evidence that irrelevant information, in the form of small and seemingly random factors, can produce major differences in outcomes.
The story of Judge Frankel’s fight for sentencing guidelines offers a glimpse of several of the key points we will cover in this book. First, judgment is difficult because the world is a complicated, uncertain place.
Second, the extent of these disagreements is much greater than we expect. While
Third, noise can be reduced.
Fourth, efforts at noise reduction often raise objections and run into serious difficulties. These issues must be addressed, too, or the fight against noise will fail.
Speaking of Noise in Sentencing “Experiments show large disparities among judges in the sentences they recommend for identical cases. This variability cannot be fair. A defendant’s sentence should not depend on which judge the case happens to be assigned to.” “Criminal sentences should not depend on the judge’s mood during the hearing, or on the outside temperature.” “Guidelines are one way to address this issue. But many people don’t like them, because they limit judicial discretion, which might be necessary to ensure fairness and accuracy. After all, each case is unique, isn’t it?”
Lotteries have their place, and they need not be unjust. Acceptable lotteries are used to allocate “goods,” like courses in some universities, or “bads,” like the draft in the military. They serve a purpose. But the judgment lotteries we talk about allocate nothing. They just produce uncertainty. Imagine an insurance company whose underwriters are noiseless and set the optimal premium, but a chance device then intervenes to modify the quote that the client actually sees. Evidently, there would be no justification for such a lottery. Neither is there any justification for a system in which the
...more
Noise Audits Reveal System Noise
Our noise audit found much greater differences. By our measure, the median difference in underwriting was 55%, about five times as large as was expected by most people, including the company’s executives. This result means, for instance, that when one underwriter sets a premium at $9,500, the other does not set it at $10,500—but instead quotes $16,700. For claims adjusters, the median ratio was 43%. We stress that these results are medians: in half the pairs of cases, the difference between the two judgments was even larger.
A defining feature of system noise is that it is unwanted, and we should stress right here that variability in judgments is not always unwanted.
Matters of taste and competitive settings all pose interesting problems of judgment. But our focus is on judgments in which variability is undesirable. System noise is a problem of systems, which are organizations, not markets.
In noisy systems, errors do not cancel out. They add up.
Noise was like a leak in the basement. It was tolerated not because it was thought acceptable but because it had remained unnoticed.
Most of us, most of the time, live with the unquestioned belief that the world looks as it does because that’s the way it is. There is one small step from this belief to another: “Other people view the world much the way I do.” These beliefs, which have been called naive realism, are essential to the sense of a reality we share with other people. We rarely question these beliefs. We hold a single interpretation of the world around us at any one time, and we normally invest little effort in generating plausible alternatives to it. One interpretation is enough, and we experience it as true. We
...more
One underwriter we interviewed described her experience in becoming a veteran in her department: “When I was new, I would discuss seventy-five percent of cases with my supervisor.… After a few years, I didn’t need to—I am now regarded as an expert.… Over time, I became more and more confident in my judgment.” Like many of us, this person had developed confidence in her judgment mainly by exercising it. The psychology of this process is well understood. Confidence is nurtured by the subjective experience of judgments that are made with increasing fluency and ease, in part because they resemble
...more
Most organizations prefer consensus and harmony over dissent and conflict.
This school is not the only organization that considers conflict avoidance at least as important as making the right decision.
shock. Our conclusion is simple: wherever there is judgment, there is noise, and more of it than you think.
Speaking of System Noise in the Insurance Company “We depend on the quality of professional judgments, by underwriters, claims adjusters, and others. We assign each case to one expert, but we operate under the wrong assumption that another expert would produce a similar judgment.” “System noise is five times larger than we thought—or than we can tolerate. Without a noise audit, we would never have realized that. The noise audit shattered the illusion of agreement.” “System noise is a serious problem: it costs us hundreds of millions.” “Wherever there is judgment, there is noise—and more of it
...more
Decisions that are made only once, like the president’s Ebola response, are singular because they are not made recurrently by the same individual or team, they lack a prepackaged response, and they are marked by genuinely unique features. In dealing with Ebola, President Obama and his team had no real precedents on which to draw. Important political decisions are often good examples of singular decisions, as are the most fateful choices of military commanders.
Singular decisions have traditionally been treated as quite separate from the recurrent judgments that interchangeable employees routinely make in large organizations. While social scientists have dealt with recurrent decisions, high-stakes singular decisions have been the province of historians and management gurus. The approaches to the two types of decisions have been quite different. Analyses of recurrent decisions have often taken a statistical bent, with social scientists assessing many similar decisions to discern patterns, identify regularities, and measure accuracy. In contrast,
...more
There is no direct way to observe the presence of noise in singular decisions.
This theoretical discussion matters. If singular decisions are just as noisy as recurrent ones, then the strategies that reduce noise in recurrent decisions should also improve the quality of singular decisions. This is a more counterintuitive prescription than it seems. When you have a one-of-a-kind decision to make, your instinct is probably to treat it as, well, one of a kind. Some even claim that the rules of probabilistic thinking are entirely irrelevant to singular decisions made under uncertainty and that such decisions call for a radically different approach. Our observations here
...more
Speaking of Singular Decisions “The way you approach this unusual opportunity exposes you to noise.” “Remember: a singular decision is a recurrent decision that is made only once.” “The personal experiences that made you who you are are not truly relevant to this decision.”
Judgment can therefore be described as measurement in which the instrument is a human mind. Implicit in the notion of measurement is the goal of accuracy—to approach truth and minimize error. The goal of judgment is not to impress, not to take a stand, not to persuade. It is important to note that the concept of judgment as we use it here is borrowed from the technical psychological literature, and that it is a much narrower concept than the same word has in everyday language. Judgment is not a synonym for thinking, and making accurate judgments is not a synonym for having good judgment.
As we define it, a judgment is a conclusion that can be summarized in a word or phrase.
Selective attention and selective recall are a source of variability across people.
The answer is that there is a second way to evaluate judgments. This approach applies both to verifiable and nonverifiable ones. It consists in evaluating the process of judgment. When we speak of good or bad judgments, we may be speaking either about the output (e.g., the number you produced in the Gambardi case) or about the process—what you did to arrive at that number.
In summary, what people usually claim to strive for in verifiable judgments is a prediction that matches the outcome. What they are effectively trying to achieve, regardless of verifiability, is the internal signal of completion provided by the coherence between the facts of the case and the judgment. And what they should be trying to achieve, normatively speaking, is the judgment process that would produce the best judgment over an ensemble of similar cases.
Like predictive judgments, evaluative judgments entail an expectation of bounded disagreement. No self-respecting federal judge is likely to say, “This is the punishment I like best, and I don’t care a bit if my colleagues think otherwise.” And decision makers who choose from several strategic options expect colleagues and observers who have the same information and share the same goals to agree with them, or at least not to disagree too much. Evaluative judgments partly depend on the values and preferences of those making them, but they are not mere matters of taste or opinion. For that
...more
Even when unfairness is only a minor concern, system noise poses another problem. People who are affected by evaluative judgments expect the values these judgments reflect to be those of the system, not of the individual judges. Something must have gone badly wrong if one customer, complaining of a defective laptop, gets fully reimbursed, and another gets a mere apology; or if one employee who has been with a firm for five years asks for a promotion and gets exactly that, while another employee, whose performance is otherwise identical, is politely turned down. System noise is inconsistency,
...more
Speaking of Professional Judgment “This is a matter of judgment. You can’t expect people to agree perfectly.” “Yes, this is a matter of judgment, but some judgments are so far out that they are wrong.” “Your choice between the candidates was just an expression of taste, not a serious judgment.” “A decision requires both
An important question, therefore, is how, and how much, bias and noise contribute to error. This chapter aims to answer that question. Its basic message is straightforward: in professional judgments of all kinds, whenever accuracy is the goal, bias and noise play the same role in the calculation of overall error.
As is true for every normal distribution, about two-thirds of the forecasts are contained within one standard deviation on either side of the mean—
we need a “scoring rule” for errors, a way to weight and combine individual errors into a single measure of overall error. Fortunately, such a tool exists. It is the method of least squares, invented in 1795 by Carl Friedrich Gauss, a famous mathematical prodigy born in 1777, who began a career of major discoveries in his teens. Gauss proposed a rule for scoring the contribution of individual errors to overall error. His measure of overall error—called mean squared error (MSE)—is the average of the squares of the individual errors of measurement. Gauss’s detailed arguments for his approach to
...more
(Recall that noise is the standard deviation of measurements, which is identical to the standard deviation of noisy errors.)
Whenever you observe noise, you should work to reduce it!
The error equation is the intellectual foundation of this book.