Noise
Rate it:
Read between January 19 - February 10, 2023
2%
Flag icon
Bias and noise—systematic deviation and random scatter—are different components of error. The targets illustrate the difference.
2%
Flag icon
Many organizations, unfortunately, are afflicted by both bias and noise.
2%
Flag icon
A general property of noise is that you can recognize and measure it while knowing nothing about the target or bias.
2%
Flag icon
For example, when the same software developers were asked on two separate days to estimate the completion time for the same task, the hours they projected differed by 71%, on average.
2%
Flag icon
Personnel decisions are noisy. Interviewers of job candidates make widely different assessments of the same people. Performance ratings of the same employees are also highly variable and depend more on the person doing the assessment than on the performance being assessed.
3%
Flag icon
wherever there is judgment, there is noise—and more of it than you think.
11%
Flag icon
System noise is inconsistency, and inconsistency damages the credibility of the system.
16%
Flag icon
Vul and Pashler wanted to find out if the same effect extends to occasion noise: can you get closer to the truth by combining two guesses from the same person, just as you do when you combine the guesses of different people? As they discovered, the answer is yes. Vul and Pashler gave this finding an evocative name: the crowd within.
16%
Flag icon
As Vul and Pashler put it, “You can gain about 1/10th as much from asking yourself the same question twice as you can from getting a second opinion from someone else.
18%
Flag icon
Or to put it differently, you are not always the same person, and you are less consistent over time than you think. But somewhat reassuringly, you are more similar to yourself yesterday than you are to another person today.
18%
Flag icon
Speaking of Occasion Noise “Judgment is like a free throw: however hard we try to repeat it precisely, it is never exactly identical.” “Your judgment depends on what mood you are in, what cases you have just discussed, and even what the weather is. You are not the same person at all times.” “Although you may not be the same person you were last week, you are less different from the ‘you’ of last week than you are from someone else today. Occasion noise is not the largest source of system noise.”
19%
Flag icon
In politics, as in music, a great deal depends on social influences and, in particular, on whether people see that other people are attracted or repelled.
19%
Flag icon
If Democrats in an online group saw that a particular point of view was obtaining initial popularity among Democrats, they would endorse that point of view, ultimately leading most Democrats, in the relevant group, to favor it. But if Democrats in a different online group saw that the very same point of view was obtaining initial popularity among Republicans, they would reject that point of view, ultimately leading most Democrats, in the relevant group, to reject it.
20%
Flag icon
But independence is a prerequisite for the wisdom of crowds. If people are not making their own judgments and are relying instead on what other people think, crowds might not be so wise after all.
20%
Flag icon
The irony is that while multiple independent opinions, properly aggregated, can be strikingly accurate, even a little social influence can produce a kind of herding that undermines the wisdom of crowds.
21%
Flag icon
after people talk with one another, they typically end up at a more extreme point in line with their original inclinations. Our
22%
Flag icon
Meehl discovered that clinicians and other professionals are distressingly weak in what they often see as their unique strength: the ability to integrate information.
23%
Flag icon
The findings support a blunt conclusion: simple models beat humans.
24%
Flag icon
You may believe that you are subtler, more insightful, and more nuanced than the linear caricature of your thinking. But in fact, you are mostly noisier.
24%
Flag icon
Speaking of Judgments and Models “People believe they capture complexity and add subtlety when they make judgments. But the complexity and the subtlety are mostly wasted—usually they do not add to the accuracy of simple models.” “More than sixty years after the publication of Paul Meehl’s book, the idea that mechanical prediction is superior to people is still shocking.” “There is so much noise in judgment that a noise-free model of a judge achieves more accurate predictions than the actual judge does.”
26%
Flag icon
The model built by machine learning was also far more successful than linear models that used the same information. The reason is intriguing: “The machine-learning algorithm finds significant signal in combinations of variables that might otherwise be missed.”
26%
Flag icon
Second, the data is sometimes rich enough for sophisticated AI techniques to detect valid patterns and go well beyond the predictive power of a simple model. When AI succeeds in this way, the advantage of these models over human judgment is not just the absence of noise but also the ability to exploit much more information.
26%
Flag icon
Given these advantages and the massive amount of evidence supporting them, it is worth asking why algorithms are not used much more extensively for the types of professional judgments we discuss in this book. For all the spirited talk about algorithms and machine learning, and despite important exceptions in particular fields, their use remains limited. Many experts ignore the clinical-versus-mechanical debate, preferring to trust their judgment. They have faith in their intuitions and doubt that machines could do better. They regard the idea of algorithmic decision making as dehumanizing and ...more
27%
Flag icon
Speaking of Rules and Algorithms “When there is a lot of data, machine-learning algorithms will do better than humans and better than simple models. But even the simplest rules and algorithms have big advantages over human judges: they are free of noise, and they do not attempt to apply complex, usually invalid insights about the predictors.” “Since we lack data about the outcome we must predict, why don’t we use an equal-weight model? It will do almost as well as a proper model, and will surely do better than case-by-case human judgment.” “You disagree with the model’s forecast. I get it. But ...more
28%
Flag icon
Pundits blessed with clear theories about how the world works were the most confident and the least accurate.
28%
Flag icon
Our conclusion, then, is that pundits should not be blamed for the failures of their distant predictions. They do, however, deserve some criticism for attempting an impossible task and for believing they can succeed in it.
30%
Flag icon
When a finding is described as “significant,” we should not conclude that the effect it describes is a strong one. It simply means that the finding is unlikely to be the product of chance alone. With a sufficiently large sample, a correlation can be at once very “significant” and too small to be worth discussing.
34%
Flag icon
Confirmation bias—the same tendency that leads us, when we have a prejudgment, to disregard conflicting evidence altogether—made us assign less importance than we should to subsequent data.
34%
Flag icon
Another term to describe this phenomenon is the halo effect, because the candidate was evaluated in the positive “halo” of the first impression.
34%
Flag icon
But in a revealing study, consumers were found to be more likely to be affected by calorie labels if they were placed to the left of the food item rather than the right. When calories are on the left, consumers receive that information first and evidently think “a lot of calories!” or “not so many calories!” before they see the item. Their initial positive or negative reaction greatly affects their choices. By contrast, when people see the food item first, they apparently think “delicious!” or “not so great!” before they see the calorie label.
34%
Flag icon
In general, we jump to conclusions, then stick to them. We think we base our opinions on evidence, but the evidence we consider and our interpretation of it are likely to be distorted, at least to some extent, to fit our initial snap judgment.
42%
Flag icon
A well-documented psychological bias called the fundamental attribution error is a strong tendency to assign blame or credit to agents for actions and outcomes that are better explained by luck or by objective circumstances. Another bias, hindsight, distorts judgments so that outcomes that could not have been anticipated appear easily foreseeable in retrospect.
43%
Flag icon
A substantial body of research in psychology and behavioral economics has documented a long list of psychological biases: the planning fallacy, overconfidence, loss aversion, the endowment effect, the status quo bias, excessive discounting of the future (“present bias”), and many others—including, of course, biases for or against various categories of people.
43%
Flag icon
Business school professor Phil Rosenzweig has convincingly argued that empty explanations in terms of biases are common in discussions of business outcomes. Their popularity attests to the prevalent need for causal stories that make sense of experience.
47%
Flag icon
The upshot is that ex post or ex ante debiasing—which, respectively, correct or prevent specific psychological biases—are useful in some situations. These approaches work where the general direction of error is known and manifests itself as a clear statistical bias. Types of decisions that are expected to be strongly biased are likely to benefit from debiasing interventions. For instance, the planning fallacy is a sufficiently robust finding to warrant debiasing interventions against overconfident planning.
49%
Flag icon
With a large sample of highly qualified examiners, the study confirmed that fingerprint experts are sometimes susceptible to occasion noise. About one decision in ten was altered.
50%
Flag icon
However low the error rate of fingerprint identification may be, it is not zero, and as PCAST noted, juries should be made aware of that.
52%
Flag icon
many volunteers who do extremely well on intelligence tests do not qualify as superforecasters. Apart from general intelligence, we could reasonably expect that superforecasters are unusually good with numbers. And they are. But their real advantage is not their talent at math; it is their ease in thinking analytically and probabilistically.
52%
Flag icon
Rather than form a holistic judgment about a big geopolitical question (whether a nation will leave the European Union, whether a war will break out in a particular place, whether a public official will be assassinated), they break it up into its component parts. They ask, “What would it take for the answer to be yes? What would it take for the answer to be no?” Instead of offering a gut feeling or some kind of global hunch, they ask and try to answer an assortment of subsidiary questions.
52%
Flag icon
Superforecasters also excel at taking the outside view, and they care a lot about base rates.
52%
Flag icon
Tetlock finds that “the strongest predictor of rising into the ranks of superforecasters is perpetual beta, the degree to which one is committed to belief updating and self-improvement.” As he puts it, “What makes them so good is less what they are than what they do—the hard work of research, the careful thought and self-criticism, the gathering and synthesizing of other perspectives, the granular judgments and relentless updating.” They like a particular cycle of thinking: “try, fail, analyze, adjust, try again.”
55%
Flag icon
When pathologists analyzed skin lesions for the presence of melanoma—the most dangerous form of skin cancer—there was only “moderate” agreement. The eight pathologists reviewing each case were unanimous or showed only one disagreement just 62% of the time. Another study at an oncology center found that the diagnostic accuracy of melanomas was only 64%, meaning that doctors misdiagnosed melanomas in one of every three lesions. A third study found that dermatologists at New York University failed to diagnose melanoma from skin biopsies 36% of the time. The authors of the study conclude that “the ...more
58%
Flag icon
One study found that a staggering 90% of managers, employees, and HR heads believe that their performance management processes fail to deliver the results they expected.
59%
Flag icon
As one review article summarized, “No matter what has been tried over decades to improve [performance management] processes, they continue to generate inaccurate information and do virtually nothing to drive performance.”
59%
Flag icon
If you are now in a position to hire employees, your selection methods probably include some version of this ritual. As one organizational psychologist noted, “It is rare, even unthinkable, for someone to be hired without some type of interview.” And almost all professionals rely to some degree on their intuitive judgments when making hiring decisions in these interviews.
59%
Flag icon
if your goal is to determine which candidates will succeed in a job and which will fail, standard interviews (also called unstructured interviews to distinguish them from structured interviews, to which we will turn shortly) are not very informative. To put it more starkly, they are often useless.
59%
Flag icon
To reach this conclusion, innumerable studies estimated the correlation between the rating an evaluator gives a candidate after an interview and the candidate’s eventual success on the job. If the correlation between the interview rating and success is high, then interviews—or any other recruiting techniques for which correlation is computed in the same manner—can be assumed to be a good predictor of how candidates will perform.
61%
Flag icon
Google uses other data as inputs on some of the dimensions it cares about. To test job-related knowledge, it relies in part on work sample tests, such as asking a candidate for a programming job to write some code. Research has shown that work sample tests are among the best predictors of on-the-job performance. Google also uses “backdoor references,” supplied not by someone the candidate has nominated but by Google employees with whom the candidate has crossed paths.
61%
Flag icon
Google’s final hiring decisions are anchored on the average score assigned by the four interviewers. They are also informed by the underlying evidence. In other words, Google allows judgment and intuition in its decision-making process only after all the evidence has been collected and analyzed. Thus, the tendency of each interviewer (and hiring committee member) to form quick, intuitive impressions and rush to judgment is kept in check.
66%
Flag icon
the US Supreme Court held that a mandatory death sentence was unconstitutional not because it was too brutal but because it was a rule. The whole point of the mandatory death sentence was to ensure against noise—to say that under specified circumstances, murderers would have to be put to death. Invoking the need for individualized treatment, the court said that “the belief no longer prevails that every offense in a like legal category calls for an identical punishment without regard to the past life and habits of a particular offender.” According to the Supreme Court, a serious constitutional ...more
« Prev 1