More on this book
Community
Kindle Notes & Highlights
Proxy metrics are more prone to criticism because they are indirect measures, and all three of these examples have been criticized significantly.
But use of these drugs actually leads to a significant increase in sudden death in patients with asymptomatic ventricular arrhythmias after a heart attack. For these patients, the reduced post-treatment rate of ventricular arrhythmias is not indicative of improved survival and is therefore not a good proxy metric.
collecting real scientific evidence beats anecdotal evidence hands down because you can draw believable conclusions.
Yes, you have to watch out for spurious correlations and subtle biases (more on that in the next section), but in the end you have results that can really advance your thinking.
The smokers in the study would therefore be those who selected to continue smoking, which can introduce a bias called selection bias.
Selection bias can also occur when a sample is selected that is not representative of the broader population of interest, as with online reviews.
However, is the school better because there are better teachers or because the students are better prepared due to their parents’ financial means and interest in education?
is nonresponse bias, which occurs when a subset of people don’t participate in an experiment after they are selected for it, e.g., they fail to respond to the survey.
Employees missing the survey due to a scheduled vacation would be random and not likely to introduce bias, but employees not filling it out due to apathy would be nonrandom and would likely bias the results.
That’s because the latter group is made up of disengaged employees, and by not participating, their disengagement is not being captured.
called survivorship bias. Unhappy employees may have chosen to leave the company, but you cannot capture their opinions when you survey only current employees.
Almost every methodology has drawbacks, and bias of one form or another is often unavoidable.
World War II, naval researchers conducted a study of damaged aircraft that returned from missions, so that they could make suggestions as to how to bolster aircraft defenses for future missions. Looking at where these planes had been hit, they concluded that areas where they had taken the most damage should receive extra armor. However, statistician Abraham Wald noted that the study sampled only planes that had survived missions, and not the many planes that had been shot down. He therefore theorized the opposite conclusion, which turned out to be correct: that the areas with holes represented
  
  ...more
if you look at tech CEOs like Bill Gates and Mark Zuckerberg, you might conclude that dropping out of school to pursue your dreams is a fine idea. However, you’d be thinking only of the people that “survived.”
Who is missing from the sample population?
What could be making this sample population nonrandom relative to the underlying population?
This much larger potential customer base may behave very differently from your existing customer base (as is the case with early adopters versus the early majority, which we described in Chapter 4).
response bias. While nonresponse bias is introduced when certain types of people do not respond, for those who do respond, various cognitive biases can cause them to deviate from accurate or truthful responses.
How questions are worded, e.g., leading or loaded questions
The order of questions, where earlier questions can influence later ones Poor or inaccurate memory of respondents Difficulty representing feelings in a number, such as one-to-ten ratings Respondents reporting things that reflect well on themselves
overstating results from a sample that is too small.
law of small numbers,
law of large numbers, which states that the larger the sample, the closer your average result is expected to be to the true average.
For now, we want to focus on what can go wrong if your sample is too small.
gambler’s fallacy, named after roulette players who believe that a streak of reds or blacks from a roulette wheel is more likely to end than to continue with the next spin.
judges were less likely to approve an asylum case if they had approved the last two.
test. Random data often contains streaks and clusters. Are you surprised to learn that there is a 50 percent chance of getting a run of four heads in a row during any twenty-flip sequence? Streaks like this are often erroneously interpreted as evidence of nonrandom behavior, a failure of intuition called the clustering illusion.
These pictures come from psychologist Steven Pinker’s book The Better Angels of Our Nature. The left picture—the one with the obvious clusters—is actually the one that is truly random. The right picture—the one that intuitively seems more random—is not; it is a depiction of the positions of glowworms on the ceiling of a cave in Waitomo, New Zealand. The glowworms intentionally space themselves apart from one another in the competition for food.
The improbable should not be confused with the impossible. If enough chances are taken, even rare events are expected to happen. Some people do win the lottery and some people do get struck by lightning. A one-in-a-million event happens quite frequently on a planet with seven billion people.
shouldn’t always expect short-term results to match long-term expectations.
you shouldn’t base long-term expectations on a small set of short-term results.
But in most cases, the true cause is purely mathematical, explained through a model called regression to the mean.
For instance, a runner is not expected to follow a record-breaking race with another record-breaking time; a slightly less impressive performance would be expected.
The takeaway is that you should never assume that a result based on a small set of observations is typical.
Like anecdotal evidence, a small sample tells you very little beyond that what happened was within the range of possible outcomes.
first impressions can be accurate, you should treat them with skepticism.
The normal distribution is a special type of probability distribution, a mathematical function that describes how the probabilities for all possible outcomes of a random phenomenon are distributed.
called the central limit theorem. This theorem states that when numbers are drawn from the same distribution and then are averaged, this resulting average approximately follows a normal distribution. This is the case even if the numbers originally came from a completely different distribution.
The central limit theorem tells us that this statistical average (sample mean) is approximately normally distributed (assuming enough people participate in the survey).
The part that hasn’t been explained yet is that the standard deviation of this distribution, also called the standard error, is not the same as the sample standard deviation calculation from earlier.
This means that if you want to reduce the margin of error by a factor of two, you need to increase the sample size by a factor of four.
One thing that really bothers us is when statistics are reported in the media without error bars or confidence intervals.
Without an error estimate, you have no idea how confident to be in that number—is the true value likely really close to it, or could it be really far away from it? The confidence interval tells you that!
pro-con list, where you list all the positive things that could happen if the decision was made (the pros), weighing them against the negative things that could happen (the cons).
First, the list presumes there are only two options, when as you just saw there are usually many more. Second, it presents all pros and cons as if they had equal weight. Third, a pro-con list treats each item independently, whereas these factors are often interrelated.
A fourth problem is that since the pros are often more obvious than the cons, this disparity can lead to a grass-is-greener mentality, causing you mentally to accentuate the positives (e.g., greener grass) and overlook the negatives.
This anecdote is meant to illustrate that it is inherently difficult to create a complete pro-con list when your experience is limited.
The hammer of decision-making models is the pro-con list; useful in some instances, but not the optimal tool for every decision.
This powerful mental model helps you more systematically and quantitatively analyze the benefits (pros) and costs (cons) across an array of options.
However, it is important to note that this model works well only if you are thorough, because you will use that final number to make decisions.




















