More on this book
Community
Kindle Notes & Highlights
Speaking of Judgments and Models “People believe they capture complexity and add subtlety when they make judgments. But the complexity and the subtlety are mostly wasted—usually they do not add to the accuracy of simple models.” “More than sixty years after the publication of Paul Meehl’s book, the idea that mechanical prediction is superior to people is still shocking.” “There is so much noise in judgment that a noise-free model of a judge achieves more accurate predictions than the actual judge does.”
Although nowadays these are the applications we have in mind when we hear the word algorithm, the term has a broader meaning. In one dictionary’s definition, an algorithm is a “process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer.” By this definition, simple models and other forms of mechanical judgment we described in the previous chapter are algorithms, too.
In fact, many types of mechanical approaches, from almost laughably simple rules to the most sophisticated and impenetrable machine algorithms, can outperform human judgment. And one key reason for this outperformance—albeit not the only one—is that all mechanical approaches are noise-free.
The challenge is that when the formula is applied out of sample—that is, when it is used to predict outcomes in a different data set—the weights will no longer be optimal. Flukes in the original sample are no longer present, precisely because they were flukes; in the new sample, managers with high technical skills are not all superstars. And the new sample has different flukes, which the formula cannot predict. The correct measure of a model’s predictive accuracy is its performance in a new sample, called its cross-validated correlation. In effect, a regression model is too successful in the
...more
Another style of simplification is through frugal models, or simple rules. Frugal models are models of reality that look like ridiculously simplified, back-of-the-envelope calculations. But in some settings, they can produce surprisingly good predictions.
For the second part of our journey, let us now travel in the opposite direction on the spectrum of sophistication. What if we could use many more predictors, gather much more data about each of them, spot relationship patterns that no human could detect, and model these patterns to achieve better prediction? This, in essence, is the promise of AI.
For example, large data sets make it possible to deal mechanically with broken-leg exceptions. This somewhat cryptic phrase goes back to an example that Meehl imagined: Consider a model that was designed to predict the probability that people will go to the movies tonight. Regardless of your confidence in the model, if you happen to know that a particular person just broke a leg, you probably know better than the model what their evening will look like.
When using simple models, the broken-leg principle holds an important lesson for decision makers: it tells them when to override the model and when not to.
The model built by machine learning was also far more successful than linear models that used the same information. The reason is intriguing: “The machine-learning algorithm finds significant signal in combinations of variables that might otherwise be missed.” The algorithm’s ability to find patterns easily missed by other methods is especially pronounced for the defendants whom the algorithm classifies as highest risk. In other words, some patterns in the data, though rare, strongly predict high risk. This finding—that the algorithm picks up rare but decisive patterns—brings us back to the
...more
To be clear, these examples do not prove that algorithms are always fair, unbiased, or nondiscriminatory. A familiar example is an algorithm that is supposed to predict the success of job candidates, but is actually trained on a sample of past promotion decisions. Of course, such an algorithm will replicate all the human biases in past promotion decisions.
One key insight has emerged from recent research: people are not systematically suspicious of algorithms. When given a choice between taking advice from a human and an algorithm, for instance, they often prefer the algorithm.
As humans, we are keenly aware that we make mistakes, but that is a privilege we are not prepared to share. We expect machines to be perfect. If this expectation is violated, we discard them.
Speaking of Rules and Algorithms “When there is a lot of data, machine-learning algorithms will do better than humans and better than simple models. But even the simplest rules and algorithms have big advantages over human judges: they are free of noise, and they do not attempt to apply complex, usually invalid insights about the predictors.” “Since we lack data about the outcome we must predict, why don’t we use an equal-weight model? It will do almost as well as a proper model, and will surely do better than case-by-case human judgment.” “You disagree with the model’s forecast. I get it. But
...more
In short, decision makers like to listen to their gut, and most seem happy with what they hear. Which raises a question: what, exactly, do these people, who are blessed with the combination of authority and great self-confidence, hear from their gut? One review of intuition in managerial decision making defines it as “a judgment for a given course of action that comes to mind with an aura or conviction of rightness or plausibility, but without clearly articulated reasons or justifications—essentially ‘knowing’ but without knowing why.” We propose that this sense of knowing without knowing why
...more
Confidence is no guarantee of accuracy, however, and many confident predictions turn out to be wrong. While both bias and noise contribute to prediction errors, the largest source of such errors is not the limit on how good predictive judgments are. It is the limit on how good they could be. This limit, which we call objective ignorance, is the focus of this chapter.
Since you are now familiar with the percent concordant statistic, you can easily see the problem this evaluation raises. A PC of 80% roughly corresponds to a correlation of .80. This level of predictive power is rarely achieved in the real world. In the field of personnel selection, a recent review found that the performance of human judges does not come close to this number. On average, they achieve a predictive correlation of .28 (PC = 59%).
This intractable uncertainty includes everything that cannot be known at this time about the outcome that you are trying to predict.
In general, however, you can safely expect that people who engage in predictive tasks will underestimate their objective ignorance. Overconfidence is one of the best-documented cognitive biases.
As a result, objective ignorance accumulates steadily the further you look into the future.
Poor Judges and Barely Better Models
By insisting on the impossibility of perfect prediction, we might seem to be stating the obvious. Admittedly, asserting that the future is unpredictable is hardly a conceptual breakthrough. However, the obviousness of this fact is matched only by the regularity with which it is ignored, as the consistent findings about predictive overconfidence demonstrate.
This observation has an important implication for the improvement of judgment. Despite all the evidence in favor of mechanical and algorithmic prediction methods, and despite the rational calculus that clearly shows the value of incremental improvements in predictive accuracy, many decision makers will reject decision-making approaches that deprive them of the ability to exercise their intuition. As long as algorithms are not nearly perfect—and, in many domains, objective ignorance dictates that they will never be—human judgment will not be replaced. That is why it must be improved.
Speaking of Objective Ignorance “Wherever there is prediction, there is ignorance, and probably more of it than we think. Have we checked whether the experts we trust are more accurate than dart-throwing chimpanzees?” “When you trust your gut because of an internal signal, not because of anything you really know, you are in denial of your objective ignorance.” “Models do better than people, but not by much. Mostly, we find mediocre human judgments and slightly better models. Still, better is good, and models are better.” “We may never be comfortable using a model to make these decisions—we
...more
We now turn to a broader question: how do we achieve comfort in a world in which many problems are easy but many others are dominated by objective ignorance? After all, where objective ignorance is severe, we should, after a while, become aware of the futility of crystal balls in human affairs. But that is not our usual experience of the world. Instead, as the previous chapter suggested, we maintain an unchastened willingness to make bold predictions about the future from little useful information. In this chapter, we address the prevalent and misguided sense that events that could not have
...more
study. This type of challenge is novel in the social sciences but common in computer science, where teams are often invited to compete in tasks such as machine translation of a standard set of texts or detection of an animal in a large set of photographs. The achievement of the winning team in these competitions defines the state of the art at a point in time, which is always exceeded in the next competition. In a social science prediction task, where rapid improvement is not expected, it is reasonable to use the most accurate prediction achieved in the competition as a measure of the
...more
Most of the selected competitors described themselves as data scientists and used machine learning.
How good were the winning models? The sophisticated machine-learning algorithms trained on a large data set did, of course, outperform the predictions of simple linear models (and would, by extension, out-predict human judges). But the improvement the AI models delivered over a very simple model was slight, and their predictive accuracy remained disappointingly low. When predicting evictions, the best model achieved a correlation of .22 (PC = 57%). Similar results were found for other single-event outcomes, such as whether the primary caregiver had been laid off or had been in job training and
...more
A review of 708 studies in the behavioral and cognitive sciences found that only 3% of reported correlations were .50 or more.
When a finding is described as “significant,” we should not conclude that the effect it describes is a strong one. It simply means that the finding is unlikely to be the product of chance alone. With a sufficiently large sample, a correlation can be at once very “significant” and too small to be worth discussing.
The logic behind this pessimistic conclusion requires some elaboration. When the authors of the Fragile Families challenge equate understanding with prediction (or the absence of one with the absence of the other), they use the term understanding in a specific sense. There are other meanings of the word: if you say you understand a mathematical concept or you understand what love is, you are probably not suggesting an ability to make any specific predictions.
However, in the discourse of social science, and in most everyday conversations, a claim to understand something is a claim to understand what causes that thing. The sociologists who collected and studied the thousands of variables in the Fragile Families study were looking for the causes of the outcomes they observed. Physicians who understand what ails a patient are claiming that the pathology they have diagnosed is the cause of the symptoms they have observed. To understand is to describe a causal chain. The ability to make a prediction is a measure of whether such a causal chain has indeed
...more
We must, however, remember that while correlation does not imply causation, causation does imply correlation.
When we give in to this feeling of inevitability, we lose sight of how easily things could have been different—how, at each fork in the road, fate could have taken a different path. Jessica could have kept her job. She could have quickly found another one. A relative could have come to her aid. You, the social worker, could have been a more effective advocate. The building manager could have been more understanding and allowed the family a few weeks of respite, making it possible for Jessica to find a job and catch up with the rent.
When you explain an unexpected but unsurprising outcome in this way, the destination that is eventually reached always makes sense. This is what we mean by understanding a story, and this is what makes reality appear predictable—in hindsight. Because the event explains itself as it occurs, we are under the illusion that it could have been anticipated. More broadly, our sense of understanding the world depends on our extraordinary ability to construct narratives that explain the events we observe.
When the search for an obvious cause fails, our first resort is to produce an explanation by filling a blank in our model of the world.
At this point, all we need to emphasize is that the causal mode comes much more naturally to us. Even explanations that should properly be treated as statistical are easily turned into causal narratives.
Where causality is plausible, our mind easily turns a correlation, however low, into a causal and explanatory force.
Speaking of the Limits of Understanding “Correlations of about .20 (PC = 56%) are quite common in human affairs.” “Correlation does not imply causation, but causation does imply correlation.” “Most normal events are neither expected nor surprising, and they require no explanation.” “In the valley of the normal, events are neither expected nor surprising—they just explain themselves.” “We think we understand what is going on here, but could we have predicted it?”
What is the origin of noise—and of bias? What mental mechanisms give rise to the variability of our judgments and to the shared errors that affect them? In short, what do we know about the psychology of noise? These are the questions to which we now turn.
The heuristics and biases program focused on what people have in common, not on how they differ. It showed that the processes that cause judgment errors are widely shared. Partly because of this history, people who are familiar with the notion of psychological bias often assume that it always produces statistical bias, a term we use in this book to mean measurements or judgments that mostly deviate from the truth in the same direction. Indeed, psychological biases create statistical bias when they are broadly shared. However, psychological biases create system noise when judges are biased in
...more
Judgment biases are often identified by reference to a true value. There is bias in predictive judgments if errors are mostly in one direction rather than the other. For instance, when people forecast how long it will take them to complete a project, the mean of their estimates is usually much lower than the time they will actually need. This familiar psychological bias is known as the planning fallacy.
We recommend reserving the word bias for specific and identifiable errors and the mechanisms that produce them.
Adding detail to a description can only make it less probable, although it can make it more representative, and
Both questions are examples of taking what we have called the outside view: when you take this view, you think of the student, or of Gambardi, as a member of a class of similar cases. You think statistically about the class, instead of thinking causally about the focal case.
(Full disclosure: We wrote the Gambardi case to illustrate noisy judgment; it took us weeks before we realized that it was also a prime example of the bias we describe here, which is called base-rate neglect. Thinking of base rates is no more automatic for the authors of this book than for anyone else.)
Substitution of one question for another is not restricted to similarity and probability. Another example is the replacement of a judgment of frequency by an impression of the ease with which instances come to mind. For example, the perception of the risk of airplane crashes or hurricanes rises briefly after well-publicized instances of such events. In theory, a judgment of risk should be based on a long-term average. In reality, recent incidents are given more weight because they come more easily to mind. Substituting a judgment of how easily examples come to mind for an assessment of
...more
In general, we jump to conclusions, then stick to them. We think we base our opinions on evidence, but the evidence we consider and our interpretation of it are likely to be distorted, at least to some extent, to fit our initial snap judgment. As a result, we maintain the coherence of the overall story that has emerged in our mind. This process is fine, of course, if the conclusions are correct. When the initial evaluation is erroneous, however, the tendency to stick to it in the face of contradictory evidence is likely to amplify errors. And this effect is difficult to control, because
...more
Speaking of Heuristics, Biases, and Noise “We know we have psychological biases, but we should resist the urge to blame every error on unspecified ‘biases.’” “When we substitute an easier question for the one we should be answering, errors are bound to occur. For instance, we will ignore the base rate when we judge probability by similarity.” “Prejudgments and other conclusion biases lead people to distort evidence in favor of their initial position.” “We form impressions quickly and hold on to them even when contradictory information comes in. This tendency is called excessive coherence.”
...more
What you just performed is an elementary example of matching. We have described judgment as an operation that assigns a value on a scale to a subjective impression (or to an aspect of an impression). Matching is an essential part of that operation. When you answer the question “On a scale of 1 to 10, how good is your mood?” or “Please give one to five stars to your shopping experience this morning,” you are matching: your task is to find a value on the judgment scale that matches your mood or experience.