More on this book
Community
Kindle Notes & Highlights
We tend to put more trust in people who trust themselves than we do in those who show their doubts.
Respect-experts excel at constructing coherent stories. Their experience enables them to recognize patterns, to reason by analogy with previous cases, and to form and confirm hypotheses quickly.
Psychologists and neuroscientists distinguish between crystallized intelligence, the ability to solve problems by relying on a store of knowledge about the world (including arithmetical operations), and fluid intelligence, the ability to solve novel problems.
Lower CRT scores are associated with many real-world judgments and beliefs, including belief in ghosts, astrology, and extrasensory perception. The scores predict whether people will fall for blatantly inaccurate “fake news.” They are even associated with how much people will use their smartphones.
People with a high need for cognition tend to be less susceptible to known cognitive biases.
Adult Decision Making Competence scale, which measures how prone people are to make typical errors in judgment like overconfidence or inconsistency in risk perceptions.
Another is the Halpern Critical Thinking Assessment, which focuses on critical thinking skills, including both a disposition toward rational thinking and a set of learnable skills.
The only measure of cognitive style or personality that they found to predict forecasting performance was another scale, developed by psychology professor Jonathan Baron to measure “actively open-minded thinking.”
It is the humility of those who are constantly aware that their judgment is a work in progress and who yearn to be corrected.
“You are an expert. But are your judgments verifiable, or are you a respect-expert?”
“We have to choose between two opinions, and we know nothing about these individuals’ expertise and track record. Let’s follow the advice of the more intelligent one.”
“Intelligence is only part of the story, however. How people think is also important. Perhaps we should pick the most thoughtful, open-minded...
This highlight has been truncated due to consecutive passage length restrictions.
Of course, people are rarely aware of their own biases when they are being misled by them. This lack of awareness is itself a known bias, the bias blind spot.
Even when they are aware of the risk of bias, forensic scientists are not immune to the bias blind spot: the tendency to acknowledge the presence of bias in others, but not in oneself.
They illustrate a decision hygiene strategy that has applicability in many domains: sequencing information to limit the formation of premature intuitions.
In any judgment, some information is relevant, and some is not. More information is not always better, especially if it has the potential to bias judgments by leading the judge to form a premature intuition.
The titles of two Hitchcock movies sum it up: a good decision maker should aim to keep a “shadow of a doubt,” not to be “the man who knew too much.”
“We have more information about this case, but let’s not tell the experts everything we know before they make their judgment, so as not to bias them. In fact, let’s tell them only what they absolutely need to know.”
forecasters tend to be overconfident: if asked to formulate their forecasts as confidence intervals rather than as point estimates, they tend to pick narrower intervals than they should.
two noise-reduction strategies that have broad applicability. One is an application of the principle we mentioned in chapter 18: selecting better judges produces better judgments. The other is one of the most universally applicable decision hygiene strategies: aggregating multiple independent estimates.
Averaging is mathematically guaranteed to reduce noise: specifically, it divides it by the square root of the number of judgments averaged. This means that if you average one hundred judgments, you will reduce noise by 90%, and if you average four hundred judgments, you will reduce it by 95%—essentially eliminating it.
One method to produce aggregate forecasts is to use prediction markets, in which individuals bet on likely outcomes and are thus incentivized to make the right forecasts.
Delphi method. In its classic form, this method involves multiple rounds during which the participants submit estimates (or votes) to a moderator and remain anonymous to one another. At each new round, the participants provide reasons for their estimates and respond to the reasons given by others, still anonymously.
These interventions exemplify three of the strategies we have described to improve judgments: 1. Training: Several forecasters completed a tutorial designed to improve their abilities by teaching them probabilistic reasoning. In the tutorial, the forecasters learned about various biases (including base-rate neglect, overconfidence, and confirmation bias); the importance of averaging multiple predictions from diverse sources; and considering reference classes. 2. Teaming (a form of aggregation): Some forecasters were asked to work in teams in which they could see and debate one another’s
...more
Organizations that want to harness the power of diversity must welcome the disagreements that will arise when team members reach their judgments independently.
AI now performs at least as well as radiologists do in detecting cancer from mammograms; further advances in AI will probably demonstrate its superiority.
Apgar’s name: appearance (skin color), pulse (heart rate), grimace (reflexes), activity (muscle tone), and respiration (breathing rate and effort).
Today’s knowledge workers balance multiple, sometimes contradictory objectives. Focusing on only one of them might produce erroneous evaluations and have harmful incentive effects.
This sobering conclusion comes mostly from studies based on 360-degree performance reviews, in which multiple raters provide input on the same person being rated, usually on multiple dimensions of performance.
As one review summarizes it, “the relationship between job performance and ratings of job performance is likely to be weak or at best uncertain.”
While averaging ratings from several raters should help to reduce system noise, it is worth noting that 360-degree feedback systems were not invented as a remedy for that problem.
many feedback questionnaires became absurdly complex.
A strong positive or negative rating on one of the first questions will tend to pull answers to subsequent questions in the same direction.
Forced ranking was advocated by Jack Welch when he was CEO of General Electric, as a way to stop inflation in ratings and to ensure “candor” in performance reviews. Many companies adopted it, only to abandon it later, citing undesirable side effects on morale and teamwork.
The upshot is that a system that depends on relative evaluations is appropriate only if an organization cares about relative performance.
But forcing a relative ranking on what purports to measure an absolute level of performance, as many companies do, is illogical.
The second problem is that the forced distribution of the ratings is assumed to reflect the distribution of the underlying true performances—typically, something close to a normal distribution.
Performance reviews continue to be one of the most dreaded rituals of organizations, hated almost as much by those who have to perform them as by those who receive them.
“We spend a lot of time on our performance ratings, and yet the results are one-quarter performance and three-quarters system noise.” “We tried 360-degree feedback and forced ranking to address this problem, but we may have made things worse.” “If there is so much level noise, it is because different raters have completely different ideas of what ‘good’ or ‘great’ means. They will only agree if we give them concrete cases as anchors on the rating scale.”
if all you know about two candidates is that one appeared better than the other in the interview, the chances that this candidate will indeed perform better are about 56 to 61%.
Interviews are also a minefield of psychological biases.
But the first seconds of an interview reflect exactly the sort of superficial qualities you associate with first impressions: early perceptions are based mostly on a candidate’s extraversion and verbal skills.
Google also adopted a decision hygiene strategy we haven’t yet described in detail: structuring complex judgments. The term structure can mean many things. As we use the term here, a structured complex judgment is defined by three principles: decomposition, independence, and delayed holistic judgment.
The first principle, decomposition, breaks down the decision into components, or mediating assessments. This step serves the same purpose as the identification of the subjudgments in a guideline: it focuses the judges on the important cues. Decomposition acts as a road map to specify what data is needed. And it filters out irrelevant information.
In Google’s case, there are four mediating assessments in the decomposition: general cognitive ability, leadership, cultural fit (called “goo...
This highlight has been truncated due to consecutive passage length restrictions.
The second principle of structured judgment, independence, requires that information on each assessment be collected independently.
To do so, interviewers are required to ask predefined questions about the candidate’s behaviors in past situations. They must also record the answers and score them against a predetermined rating scale, using a unified rubric. The rubric gives examples of what average, good, or great answers look like for each question.
The third principle of structured judgment, delayed holistic judgment, can be summarized in a simple prescription: do not exclude intuition, but delay it.
“we are all reasonable people and we disagree, so this must be a subject on which reasonable people can disagree.”
Comparative judgments become much easier in the context of a recurring decision. If you have evaluated the management teams of dozens, even hundreds of companies, you can use this shared experience as a reference class. A practical way to do this is to create a case scale defined by anchor cases.