More on this book
Community
Kindle Notes & Highlights
Read between
June 22, 2019 - January 8, 2020
“Anecdotal thinking comes naturally, science requires training.”
What is often overlooked when this fallacy arises is a confounding factor, a third, possibly non-obvious factor that influences both the assumed cause and the observed effect, confounding the ability to draw a correct conclusion.
One method to consider, often referred to as the gold standard in experimental design, is the randomized controlled experiment, where participants are randomly assigned to two groups, and then results from the experimental group (who receive a treatment) are compared with the results from the control group (who do not). This setup isn’t limited to medicine; it can be used in fields such as advertising and product development. (We will walk through a detailed example in a later section.)
the nocebo effect.
One of the hardest things about designing a solid experiment is defining its endpoint, the metric that is used to evaluate the hypothesis. Ideally, the endpoint is an objective metric, something that can be easily measured and consistently interpreted. Some examples of objective metrics include whether someone bought a product, is still alive, or clicked a button on a website.
The smokers in the study would therefore be those who selected to continue smoking, which can introduce a bias called selection bias.
Essentially, you must be really careful when drawing conclusions based on nonrandom experiments. The Dilbert cartoon above pokes fun at the selection bias inherent in a lot of the studies reported in the news.
Another type of selection bias, common to surveys, is nonresponse bias, which occurs when a subset of people don’t participate in an experiment after they are selected for it, e.g., they fail to respond to the survey. If the reason for not responding is related to the topic of the survey, the results will end up biased.
When you critically evaluate a study (or conduct one yourself), you need to ask yourself: Who is missing from the sample population? What could be making this sample population nonrandom relative to the underlying population?
When you interpret data, you should watch out for a basic mistake that causes all sorts of trouble: overstating results from a sample that is too small.
name is derived from a valid statistical concept called the law of large numbers, which states that the larger the sample, the closer your average result is expected to be to the true average.
The takeaway is that you should never assume that a result based on a small set of observations is typical.
The normal distribution is a special type of probability distribution, a mathematical function that describes how the probabilities for all possible outcomes of a random phenomenon are distributed.
We called this section “The Bell Curve,” however, because the normal distribution is especially useful due to one of the handiest results in all of statistics, called the central limit theorem. This theorem states that when numbers are drawn from the same distribution and then are averaged, this resulting average approximately follows a normal distribution. This is the case even if the numbers originally came from a completely different distribution.
This type of data looks nothing like a normal distribution, as each data point can take only one of two possible values. Binary data like this is often analyzed using a different probability distribution, called the Bernoulli distribution, which represents the result of a single yes/no-type experiment or question, such as from a survey or poll. This distribution is useful in a wide variety of situations, such as analyzing advertising campaigns (whether someone purchased or not), clinical trials (responded to treatment or not), and A/B testing (clicked or not).
When you take a poll, you get only an estimate of the overall approval rating (like the 24 percent approval rating estimate mentioned earlier). When you do that, you are taking a sample from the entire population (e.g., asking one thousand people) and averaging the results to calculate the estimate. This sample mean has a distribution itself, called the sample distribution, which describes the chances of getting each possible approval rating from the sample. You can think of this distribution as the result of plotting the different approval ratings (sample means) obtained from many, many
...more
Since the margin of error is an expression of how confident the pollsters are in their estimate, it makes sense that it is directly related to the size of the sample group.
This is an example of a model called conditional probability, the probability of one thing happening under the condition that something else also happened. Conditional probability allows us to better estimate probabilities by using this additional information.
Now that you know about Bayes’ theorem, you should also know that there are two schools of thought in statistics, based on different ways to think about probability: Frequentist and Bayesian. Most studies you hear about in the news are based on frequentist statistics, which relies on and requires many observations of an event before it can make reliable statistical determinations. Frequentists view probability as fundamentally tied to the frequency of events.
Bayesians, by contrast, allow probabilistic judgments about any situation, regardless of whether any observations have yet occurred. To do this, Bayesians begin by bringing related evidence to statistical determinations. For example, picking a penny up off the street, you’d probably initially estimate a fifty-fifty chance that it would come up heads if you flipped it, even if you’d never observed a flip of that particular coin before. In Bayesian statistics, you can bring such knowledge of base rates to a problem. In frequentist statistics, you cannot.
As another example, consider a mammogram, a medical test used in the diagnosis of breast cancer. You might think a test like this has two possible results: positive or negative. But really a mammogram has four possible outcomes, depicted in the following table. The two possible outcomes you immediately think of are when the test is right, the true positive and the true negative; the other two outcomes occur when the test is wrong, the false positive and the false negative.
In statistics, a false positive is also known as a type I error and a false negative is also called a type II error. When designing an experiment, scientists get to decide on the probability of each type of error they are willing to tolerate. The most common false positive rate chosen is 5 percent. (This rate is also denoted by the Greek letter α, alpha, which is equal to 100 minus the confidence level. This is why you typically see people say a confidence level of 95 percent.) That means that, on average, if your hypothesis is false, one in twenty experiments (5 percent) will get a false
...more
The developers plan a study in a sleep lab to test their theory. The test group will use their app and the control group will just go to sleep without it. (A real study might have a slightly more complicated design, but this simple design will let us better explain the statistical models.) The statistical setup behind most experiments (including this one) starts with a hypothesis that there is no difference between the groups, called the null hypothesis. If the developers collect sufficient evidence to reject this hypothesis, then they will conclude that their app really does help people fall
...more
The developers also need to specify an alternative hypothesis, which describes the smallest meaningful change they think could occur between the two groups, e.g., 15 percent more people will fall asleep within ten minutes. This is the real result they want their study to confirm and have an 80 percent chance to detect (corresponding to a false negative rate of 20 percent).
The dotted line represents the threshold for statistical significance. All values larger than this threshold (to the right) would result in rejection of the null hypothesis because differences this large are very unlikely to have occurred if the null hypothesis were true. In fact, they would occur with less than a 5 percent chance—the false positive rate initially set by the developers.
The final measure commonly used to declare whether a result is statistically significant is called the p-value, which is formally defined as the probability of obtaining a result equal to or more extreme than what was observed, assuming the null hypothesis was true. Essentially, if the p-value is smaller than the selected false positive rate (5 percent), then you would say that the result is statistically significant. P-values are commonly used in study reports to communicate such significance.
This anecdote is meant to illustrate that it is inherently difficult to create a complete pro-con list when your experience is limited. Other mental models in this chapter will help you approach situations like these with more objectivity and skepticism, so you can uncover the complete picture faster and make sense of what to do about it.
Scoring in this way helps you overcome some of the pro-con list deficiencies. Now each item isn’t treated equally anymore. You can also group multiple items together into one score if they are interrelated. And you can now more easily compare multiple options: simply add up all the pros and cons for each option (e.g., job offers) and see which one comes out on top.
This method is a simple type of cost-benefit analysis, a natural extension of the pro-con list that works well as a drop-in replacement in many situations. This powerful mental model helps you more systematically and quantitatively analyze the benefits (pros) and costs (cons) across an array of options.
The reason it is important to lay out the costs and benefits over time in this manner (in addition to increased clarity) is that benefits you get today are worth more than those same benefits later. There are three reasons for this that are important to appreciate, so please excuse the tangent; back to the cost-benefit analysis in a minute.
However, if you purchased the second bond, your $75,000 would be freed up after six years, leaving you four more years to invest that money in another way. If you were able to invest that money in a new investment at a high enough rate, this second bond is potentially more attractive in the end. When making a comparison, you therefore must consider what could happen over the same time frame.
This situation (multiple bids with timing/quality concerns) is very common, but because of the uncertainty introduced in the outcome, it’s a bit too complex to analyze easily with just cost-benefit analysis. Luckily, there is another straightforward mental model you can use to make sense of all these potential outcomes: the decision tree. It’s a diagram that looks like a tree (drawn on its side), and helps you analyze decisions with uncertain outcomes. The branches (often denoted by squares) are decision points and the leaves represent different possible outcomes (often using open circles to
...more
You can now use your probability estimates to get an expected value for each contractor, by multiplying through each potential outcome’s probability with its cost, and then summing them all up. This resulting summed value is what you would expect to pay on average for each contractor, given all the potential outcomes.
By using these increased values in your decision tree, you can effectively “price in” the extra costs. Because these new values include more than the exact cost you’d have to pay out, they are called utility values, which reflect your total relative preferences across the various scenarios. We already saw this idea in the last section when we discussed putting a price to the preference of not having a landlord. This is the mental model that encapsulates the concept.
Just as in cost-benefit analysis and scoring pro-con lists, we recommend using utility values whenever possible because they paint a fuller picture of your underlying preferences, and therefore should result in more satisfactory decisions. In fact, more broadly, there is a philosophy called utilitarianism that expresses the view that the most ethical decision is the one that creates the most utility for all involved.
One thing to watch out for in this type of analysis is the possibility of black swan events, which are extreme, consequential events (that end in things like financial ruin), but which have significantly higher probabilities than you might initially expect. The name is derived from the false belief, held for many centuries in Europe and other places, that black swans did not exist, when in fact they were (and still are) common birds in Australia.
To better determine the outcome probabilities in highly complex systems like banking or climate, you may first have to take a step back and try to make sense of the whole system before you can even try to create a decision tree or cost-benefit analysis for a particular subset or situation. Systems thinking describes this act, when you attempt to think about the entire system at once. By thinking about the overall system, you are more likely to understand and account for subtle interactions between components that could otherwise lead to unintended consequences from your decisions.
A related mental model that also arises in dynamic systems and simulations is hysteresis, which describes how a system’s current state can be dependent on its history.
A Monte Carlo simulation is actually many simulations run independently, with random initial conditions or other uses of random numbers within the simulation itself. By running a simulation of a system many times, you can begin to understand how probable different outcomes really are. Think of it as a dynamic sensitivity analysis.
These types of what-if questions can also be applied to the past, in what is called counterfactual thinking, which means thinking about the past by imagining that the past was different, counter to the facts of what actually occurred. You’ve probably seen this model in books and movies about scenarios such as what would have happened if Germany had won World War II (e.g., Philip K. Dick’s The Man in the High Castle). Examples from your own life can help you improve your decision making when you think through the possible consequences of your past decisions. What if I had taken that job? What
...more
It is therefore tempting to involve multiple people in brainstorming sessions from the get-go. However, studies show this is not the right approach because of groupthink, a bias that emerges because groups tend to think in harmony. Within group settings, members often strive for consensus, avoiding conflict, controversial issues, or even alternative solutions once it seems a solution is already favored by the group.
It is this last recommendation that is particularly relevant for scenario analysis, as it forms the basis for divergent thinking, where you actively try to get thinking to diverge in order to discover multiple possible solutions, as opposed to convergent thinking, where you actively try to get thinking to converge on one solution.
Diversity of opinion: Crowdsourcing works well when it draws on different people’s private information based on their individual knowledge and experiences. Independence: People need to be able to express their opinions without influence from others, avoiding groupthink. Aggregation: The entity doing the crowdsourcing needs to be able to combine the diverse opinions in such a way as to arrive at a collective decision.
In this chapter as a whole, we’ve seen an array of decision models that surpass the simple pro-con list that we started with. When you’ve arrived at a decision using one or more of these mental models, a good final step is to produce a business case, a document that outlines the reasoning behind your decision.
an organization, avoiding an arms race means differentiating yourself from the competition instead of pursuing a one-upmanship strategy on features or deals, which can eat away at your profit margins. By focusing on your unique value proposition, you can devote more resources to improving and communicating it rather than to keeping up with your competition.
The dual betrayal with its dual five-year sentences is known as the Nash equilibrium of this game, named after mathematician John Nash, one of the pioneers of game theory and the subject of the biopic A Beautiful Mind. The Nash equilibrium is a set of player choices for which a change of strategy by any one player would worsen their outcome. In this case, the Nash equilibrium is the strategy of dual betrayals, because if either player instead chose to remain silent, that player would get a longer sentence. To both get a shorter sentence, they’d have to act cooperatively, coordinating their
...more
The mental model this study illustrates is called reciprocity, whereby you tend to feel an obligation to return (or reciprocate) a favor, whether that favor was invited or not. In many cultures, it is generally expected that people in social relationships will exchange favors like this, such as taking turns driving a carpool or bringing a bottle of wine to a dinner party. Quid pro quo (Latin for “something for something”) and I’ll scratch your back if you’ll scratch mine are familiar phrases that relate to this model.
The second model that Cialdini describes is commitment—if you agree (or commit) to something, however small, you are more likely to continue to agree later. That’s because not being consistent causes psychological discomfort, called cognitive dissonance (see Chapter 1).
Salespeople will also try to find common ground through a model Cialdini calls liking. Quite simply, you are more prone to take advice from people you like, and you tend to like people who share characteristics with you. That’s why they ask you questions such as “Are you a baseball fan?” or “Where did you grow up?” and, after your response, they might tell you, “I’m a Yankees fan too!” or “Oh, my cousin lives there. . . .”
A fourth influence model is known as social proof, drawing on social cues as proof that you are making a good decision. You are more likely to do things that you see other people doing, because of your instinct to want to be part of the group (see in-group favoritism in Chapter 4). Think of fashion and food trends or “trending” stories and memes online.

