Mario Schlosser’s Kindle Notes & Highlights for The Book of Why: The New Science of Cause and Effect

Rate it:

Open Preview

More on this book

Community

Phillip Hunter

1 note & 59 highlights

Brad Balderson

42 notes & 42 highlights

Michael Hayes

9 notes & 502 highlights

1 note & 102 highlights

Joanne McKinnon

4 notes & 5 highlights

Brian Cajes

1 note & 46 highlights

Alexander Telfar

16 notes & 47 highlights

Mark Gerstein

Benjamin Caldwell

Matt

Christopher

Devika

Roozbeh Daneshvar

Harald G.

Vadim Dmitriev

Nick Rong

Bronwyn

Juan Martin

Aurghyadip

Dale Alleshouse

Ian Pitchford

Magnus

Alok Kejriwal

George Leontiev

Tom Semple

Bon Osonwanne

Benjamin

Nancy

Josh

Rahul Krishna

Mike

Eric Yang

Kindle Notes & Highlights

by Mario Schlosser

See all Mario’s Notes & Highlights

The Book of Why: The New Science of Cause and Effect

by Judea Pearl

Read between July 2 - September 9, 2018

At the first level, association, we are looking for regularities in observations. This is what an owl does when observing how a rat moves and figuring out where the rodent is likely to be a moment later, and it is what a computer Go program does when it studies a database of millions of Go games so that it can figure out which moves are associated with a higher percentage of wins. We say that one event is associated with another if observing one changes the likelihood of observing the other. The first rung of the ladder calls for predictions based on passive observations. It is characterized ...more

As these examples illustrate, the defining query of the second rung of the Ladder of Causation is “What if we do…?” What will happen if we change the environment? We can write this kind of query as P(floss | do(toothpaste)), which asks about the probability that we will sell floss at a certain price, given that we set the price of toothpaste at another price. Another popular question at the second level of causation is “How?,” which is a cousin of “What if we do…?” For instance, the manager may tell us that we have too much toothpaste in our warehouse. “How can we sell it?” he asks. That is, ...more

While reasoning about interventions is an important step on the causal ladder, it still does not answer all questions of interest. We might wonder, My headache is gone now, but why? Was it the aspirin I took? The food I ate? The good news I heard? These queries take us to the top rung of the Ladder of Causation, the level of counterfactuals, because to answer them we must go back in time, change history, and ask, “What would have happened if I had not taken the aspirin?” No experiment in the world can deny treatment to an already treated person and compare the two outcomes, so we must import a ...more

The laws of physics, for example, can be interpreted as counterfactual assertions, such as “Had the weight on this spring doubled, its length would have doubled as well” (Hooke’s law). This statement is, of course, backed by a wealth of experimental (rung-two) evidence, derived from hundreds of springs, in dozens of laboratories, on thousands of different occasions. However, once anointed as a “law,” physicists interpret it as a functional relationship that governs this very spring, at this very moment, under hypothetical values of the weight. All of these different worlds, where the weight is ...more

12%

The main point is this: while probabilities encode our beliefs about a static world, causality tells us whether and how probabilities change when the world changes, be it by intervention or by act of imagination.

13%

Galton had started gathering a variety of “anthropometric” statistics: height, forearm length, head length, head width, and so on. He noticed that when he plotted height against forearm length, for instance, the same phenomenon of regression to the mean took place. Tall men usually had longer-than-average forearms—but not as far above average as their height. Clearly height is not a cause of forearm length, or vice versa. If anything, both are caused by genetic inheritance. Galton started using a new word for this kind of relationship: height and forearm length were “co-related.” Eventually, ...more

13%

Later he realized an even more startling fact: in generational comparisons, the temporal order could be reversed. That is, the fathers of sons also revert to the mean. The father of a son who is taller than average is likely to be taller than average but shorter than his son (see Figure 2.2). Once Galton realized this, he had to give up any idea of a causal explanation for regression, because there is no way that the sons’ heights could cause the fathers’ heights.

13%

24%

Belief propagation formally works in exactly the same way whether the arrows are noncausal or causal. Nevertheless, you may have the intuitive feeling that we have done something more meaningful in the latter case than in the former. That is because our brains are endowed with special machinery for comprehending cause-effect relationships (such as cancer and mammograms). Not so for mere associations (such as tea and scones).

24%

A B C. This junction is the simplest example of a “chain,” or of mediation. In science, one often thinks of B as the mechanism, or “mediator,” that transmits the effect of A to C. A familiar example is Fire Smoke Alarm. Although we call them “fire alarms,” they are really smoke alarms.

25%

A B C. This kind of junction is called a “fork,” and B is often called a common cause or confounder of A and C. A confounder will make A and C statistically correlated even though there is no direct causal link between them. A good example (due to David Freedman) is Shoe Size Age of Child Reading Ability. Children with larger shoes tend to read at a higher level. But the relationship is not one of cause and effect. Giving a child larger shoes won’t make him read better! Instead, both variables are explained by a third, which is the child’s age. Older children have larger shoes, and they also ...more

25%

A B C. This is the most fascinating junction, called a “collider.” Felix Elwert and Chris Winship have illustrated this junction using three features of Hollywood actors: Talent Celebrity Beauty. Here we are asserting that both talent and beauty contribute to an actor’s success, but beauty and talent are completely unrelated to one another in the general population. We will now see that this collider pattern works in exactly the opposite way from chains or forks when we condition on the variable in the middle. If A and C are independent to begin with, conditioning on B will make them ...more

25%

We really mean that any prior causes of A can be adequately summarized in the prior probability P(A) that A is true. For example, in the Disease Test example, family history might be a cause of Disease. But as long as we are sure that this family history will not affect the variable Test (once we know the status of Disease), we need not represent it as a node in the graph. However, if there is a cause of Disease that also directly affects Test, then that cause must be represented explicitly in the diagram. In the case where the node A has a parent, A has to “listen” to its parent before ...more

34%

In the last chapter, we looked at three rules that tell us how to stop the flow of information through any individual junction. I will repeat them for emphasis: (a) In a chain junction, A B C, controlling for B prevents information about A from getting to C or vice versa. (b) Likewise, in a fork or confounding junction, A B C, controlling for B prevents information about A from getting to C or vice versa. (c) Finally, in a collider, A B C, exactly the opposite rules hold. The variables A and C start out independent, so that information about A tells you nothing about C. But if you control for ...more

This highlight has been truncated due to consecutive passage length restrictions.

35%

In fact, the full model in Forbes’ paper has a few more variables and looks like the diagram in Figure 4.7. Note that Game 5 is embedded in this model in the sense that the variables A, B, C, X, and Y have exactly the same relationships. So we can transfer our conclusions over and conclude that we have to control for A and B or for C; but C is an unobservable and therefore uncontrollable variable. In addition we have four new confounding variables: D = parental asthma, E = chronic bronchitis, F = sex, and G = socioeconomic status. The reader might enjoy figuring out that we must control for E, ...more

39%

Several studies had already shown that the babies of smoking mothers weighed less at birth on average than the babies of nonsmokers, and it was natural to suppose that this would translate to poorer survival. Indeed, a nationwide study of low-birth-weight infants (defined as those who weigh less than 5.5 pounds at birth) had shown that their death rate was more than twenty times higher than that of normal-birth-weight infants. Thus, epidemiologists posited a chain of causes and effects: Smoking Low Birth Weight Mortality. What Yerushalmy found in the data was unexpected even to him. It was ...more

This highlight has been truncated due to consecutive passage length restrictions.

41%

In 1946, Joseph Berkson, a biostatistician at the Mayo Clinic, pointed out a peculiarity of observational studies conducted in a hospital setting: even if two diseases have no relation to each other in the general population, they can appear to be associated among patients in a hospital. To understand Berkson’s observation, let’s start with a causal diagram (Figure 6.3). It’s also helpful to think of a very extreme possibility: neither Disease 1 nor Disease 2 is ordinarily severe enough to cause hospitalization, but the combination is. In this case, we would expect Disease 1 to be highly ...more

This highlight has been truncated due to consecutive passage length restrictions.

42%

Try this experiment: Flip two coins simultaneously one hundred times and write down the results only when at least one of them comes up heads. Looking at your table, which will probably contain roughly seventy-five entries, you will see that the outcomes of the two simultaneous coin flips are not independent. Every time Coin 1 landed tails, Coin 2 landed heads. How is this possible? Did the coins somehow communicate with each other at light speed? Of course not. In reality you conditioned on a collider by censoring all the tails-tails outcomes.

42%

The study was observational, not randomized, with sixty men and sixty women. This means that the patients themselves decided whether to take or not to take the drug. Table 6.4 shows how many of each gender received Drug D and how many were subsequently diagnosed with heart attack. Let me emphasize where the paradox is. As you can see, 5 percent (one in twenty) of the women in the control group later had a heart attack, compared to 7.5 percent of the women who took the drug. So the drug is associated with a higher risk of heart attack for women. Among the men, 30 percent in the control group ...more

44%

The diagram in Figure 6.4 encodes the crucial information that gender is unaffected by the drug and, in addition, gender affects the risk of heart attack (men being at greater risk) and whether the patient chooses to take Drug D. In the study, women clearly had a preference for taking Drug D and men preferred not to. Thus Gender is a confounder of Drug and Heart Attack. For an unbiased estimate of the effect of Drug on Heart Attack, we must adjust for the confounder. We can do that by looking at the data for men and women separately, then taking the average: FIGURE 6.4. Causal diagram for the ...more

45%

So far most of our examples of Simpson’s reversal and paradox have involved binary variables: a patient either got Drug D or didn’t and either had a heart attack or didn’t. However, the reversal can also occur with continuous variables and is perhaps easier to understand in that case because we can draw a picture. Consider a study that measures weekly exercise and cholesterol levels in various age groups. When we plot hours of exercise on the x-axis and cholesterol on the y-axis, as in Figure 6.6(a), we see in each age group a downward trend, indicating perhaps that exercise reduces ...more

This highlight has been truncated due to consecutive passage length restrictions.

58%

In this approach we pretend that the data came from some unknown random source and use standard statistical methods to find the line (or, in this case, plane) that best fits the data. The output of such an approach might be an equation that looks like this: S = $65,000 + 2,500 × EX + 5,000 × ED (8.1) Equation 8.1 tells us that (on average) the base salary of an employee with no experience and only a high school diploma is $65,000. For each year of experience, the salary increases by $2,500, and for each additional educational degree (up to two), the salary increases by $5,000. ...more

58%

Now let’s see how a structural causal model would treat the same data. First, before we even look at the data, we draw a causal diagram (Figure 8.3). The diagram encodes the causal story behind the data, according to which Experience listens to Education and Salary listens to both. In fact, we can already tell something very important just by looking at the diagram. If our model were wrong and EX were a cause of ED, rather than vice versa, then Experience would be a confounder, and matching employees with similar experience would be completely appropriate. With ED as the cause of EX, ...more

67%

Unlike Kruskal, we can draw a diagram and see exactly what the problem is. Figure 9.5 shows the causal diagram representing Kruskal’s counterexample. Does it look slightly familiar? It should! It is exactly the same diagram that Barbara Burks drew in 1926, but with different variables. One is tempted to say, “Great minds think alike,” but perhaps it would be more appropriate to say that great problems attract great minds. FIGURE 9.5. Causal diagram for Berkeley admissions paradox—Kruskal’s version. Kruskal argued that the analysis in this situation should control for both the department and ...more

74%

This is not a Simpson’s paradox situation. It doesn’t matter whether we aggregate the data or stratify it; in every severity category, as well as in the aggregate, survival was slightly greater for soldiers who did not get tourniquets. (The difference in survival rates was, however, too small to be statistically significant.) What went wrong? One possibility, of course, is that tourniquets aren’t better. Our belief in them could be a case of confirmation bias. When a soldier gets a tourniquet and survives, his doctors and his buddies will say, “That tourniquet saved his life.” But if the ...more

This highlight has been truncated due to consecutive passage length restrictions.

75%

Suppose we want to know the effect of an online advertisement (X) on the likelihood that a consumer will purchase the product (Y)—say, a surfboard. We have data from studies in five different places: Los Angeles, Boston, San Francisco, Toronto, and Honolulu. Now we want to estimate how effective the advertisement will be in Arkansas. Unfortunately, each population and each study differs slightly. For example, the Los Angeles population is younger than our target population, and the San Francisco population differs in click-through rate. Figure 10.1 shows the unique characteristics of each ...more

This highlight has been truncated due to consecutive passage length restrictions.

See a Problem?

Preview — The Book of Why by Judea Pearl