The Book of Why: The New Science of Cause and Effect (Penguin Science)
Rate it:
8%
Flag icon
Why can’t we answer our floss question just by observation? Why not just go into our vast database of previous purchases and see what happened previously when toothpaste cost twice as much? The reason is that on the previous occasions, the price may have been higher for different reasons. For example, the product may have been in short supply, and every other store also had to raise its price. But now you are considering a deliberate intervention that will set a new price regardless of market conditions. The result might be quite different from when the customer couldn’t find a better deal ...more
12%
Flag icon
Out of 1 million children, 990,000 get vaccinated, 9,900 have the reaction, and 99 die from it. Meanwhile, 10,000 don’t get vaccinated, 200 get smallpox, and 40 die from the disease. In summary, more children die from vaccination (99) than from the disease (40). I can empathize with the parents who might march to the health department with signs saying, “Vaccines kill!” And the data seem to be on their side; the vaccinations indeed cause more deaths than smallpox itself. But is logic on their side? Should we ban vaccination or take into account the deaths prevented? Figure 1.7 shows the causal ...more
14%
Flag icon
a convenient fact for the producers of The Price Is Right, who can estimate accurately how much money the contestants will win at Plinko over the long run. This is the same law that makes insurance companies so profitable, despite the uncertainties in human affairs.
15%
Flag icon
A slope of 1 would be perfect correlation, which means every extra inch for the father is passed deterministically to the son, who would also be an inch taller. The slope can never be greater than 1; if it were, the sons of tall fathers would be taller on average, and the sons of short fathers would be shorter—and this would force the distribution of heights to become wider over time. After a few generations we would start having 9-foot people and 2-foot people, which is not observed in nature. So, provided the distribution of heights stays the same from one generation to the next, the slope ...more
16%
Flag icon
This stands in blatant contradiction to Kahneman’s equations: Success = talent + luck Great success = A little more talent + a lot of luck. According to
17%
Flag icon
After graduating from Cambridge in 1879, Pearson spent a year abroad in Germany and fell so much in love with its culture that he promptly changed his name from Carl to Karl. He was a socialist long before it became popular, and he wrote to Karl Marx in 1881, offering to translate Das Kapital into English. Pearson, arguably one of England’s first feminists, started the Men’s and Women’s Club in London for discussions of “the woman question.” He was concerned about women’s subordinate position in society and advocated for them to be paid for their work. He
17%
Flag icon
For instance, for a fun example postdating Pearson’s time, there is a strong correlation between a nation’s per capita chocolate consumption and its number of Nobel Prize winners.
17%
Flag icon
more likely explanation is that more people in wealthy, Western countries eat chocolate, and the Nobel Prize winners have also been chosen preferentially from those countries.
17%
Flag icon
One typical case is now called confounding, and the chocolate-Nobel story is an example. (Wealth and location are confounders, or common causes of both chocolate consumption and Nobel frequency.)
17%
Flag icon
Another type of “nonsense correlation” often emerges in time series data. For example, Yule found an incredibly high correlation (0.95) between England’s mortality rate in a given year and the percentage of marriages conducted that year in the Church of England. Was God punishing marriage-happy Anglicans? No! Two separate historical trends were simply occurring at the same time: the country’s mortality rate was decreasing and membership in the Church of England was declining. Since both were going down at the same time, there was a positive correlation between them, but no causal connection.
23%
Flag icon
Bayesian statistics give us an objective way of combining the observed evidence with our prior knowledge (or subjective belief) to obtain a revised belief and hence a revised prediction of the outcome of the coin’s next toss. Still, what frequentists could not abide was that Bayesians were allowing opinion, in the form of subjective probabilities, to intrude into the pristine kingdom of statistics. Mainstream statisticians were won over only grudgingly, when Bayesian analysis proved a superior tool for a variety of applications, such as weather prediction and tracking enemy submarines. In ...more
26%
Flag icon
As we saw, Bayes’s rule is formally an elementary consequence of his definition of conditional probability. But epistemologically, it is far from elementary. It acts, in fact, as a normative rule for updating beliefs in response to evidence. In other words, we should view Bayes’s rule not just as a convenient definition of the new concept of “conditional probability” but as an empirical claim to faithfully represent the English expression “given that I know.” It asserts, among other things, that the belief a person attributes to S after discovering T is never lower than the degree of belief ...more
26%
Flag icon
this example the forward probability is the probability of a positive test, given that you have the disease: P(test | disease). This is what a doctor would call the “sensitivity”
26%
Flag icon
According to the Breast Cancer Surveillance Consortium (BCSC), the sensitivity of mammograms for forty-year-old women is 73 percent. The denominator, P(T), is a bit trickier. A positive test, T, can come both from patients who have the disease and from patients who don’t. Thus, P(T) should be a weighted average of P(T | D) (the probability of a positive test among those who have the disease) and P(T | ~D) (the probability of a positive test among those who don’t). The second is known as the false positive rate. According to the BCSC, the false positive rate for forty-year-old women is about 12 ...more
26%
Flag icon
Why a weighted average? Because there are many more healthy women (~D) than women with cancer (D). In fact, only 1 in 700 women has cancer, and the other 699 do not, so the probability of a positive test for a randomly chosen woman should be much more strongly influenced by the 699 women who don’t have cancer than by the one woman who does. Mathematically, we compute the weighted average as follows: P(T) = (1/700) × (73 percent) + (699/700) × (12 percent) ≈ 12.1 percent. The weights come about because only 1 in 700 women has a 73 percent chance of a positive test, and the other 699 have a 12 ...more
28%
Flag icon
three-node network with two links, which I will call a “junction.” These are the building blocks of all Bayesian networks (and causal networks as well). There are three basic types of junctions, with the help of which we can characterize any pattern of arrows in the network.
28%
Flag icon
A → B → C. This junction is the simplest example of a “chain,” or of mediation. In science, one often thinks of B as the mechanism, or “mediator,” that transmits the effect of A to C. A familiar example is Fire → Smoke → Alarm. Although we call them “fire alarms,” they are really smoke alarms. The fire by itself does not set off an alarm, so there is no direct arrow from Fire to Alarm. Nor does the fire set off the alarm through any other variable, such as heat. It works only by releasing smoke molecules in the air. If we disable that link in the chain, for instance by sucking all the smoke ...more
28%
Flag icon
A ← B → C. This kind of junction is called a “fork,” and B is often called a common cause or confounder of A and C. A confounder will make A and C statistically correlated even though there is no direct causal link between them. A good example (due to David Freedman) is Shoe Size ← Age of Child → Reading Ability. Children with larger shoes tend to read at a higher level. But the relationship is not one of cause and effect. Giving a child larger shoes won’t make him read better! Instead, both variables are explained by a third, which is the child’s age. Older children have larger shoes, and ...more
29%
Flag icon
A → B ← C. This is the most fascinating junction, called a “collider.” Felix Elwert and Chris Winship have illustrated this junction using three features of Hollywood actors: Talent → Celebrity ← Beauty. Here we are asserting that both talent and beauty contribute to an actor’s success, but beauty and talent are completely unrelated to one another in the general population. We will now see that this collider pattern works in exactly the opposite way from chains or forks when we condition on the variable in the middle. If A and C are independent to begin with, conditioning on B will make them ...more
29%
Flag icon
Suppose you’ve just landed in Zanzibar after making a tight connection in Aachen, and you’re waiting for your suitcase to appear on the carousel. Other passengers have started to get their bags, but you keep waiting … and waiting … and waiting. What are the chances that your suitcase did not actually make the connection from Aachen to Zanzibar? The answer depends, of course, on how long you have been waiting. If the bags have just started to show up on the carousel, perhaps you should be patient and wait a little bit longer. If you’ve been waiting a long time, then things are looking bad. We ...more
This highlight has been truncated due to consecutive passage length restrictions.
31%
Flag icon
To begin at the beginning, when you talk into a phone, it converts your beautiful voice into a string of ones and zeros (called bits) and transmits these using a radio signal. Unfortunately, no radio signal is received with perfect fidelity. As the signal makes its way to the cell tower and then to your friend’s phone, some random bits will flip from zero to one or vice versa. To correct these errors, we can add redundant information. An ultrasimple scheme for error correction is simply to repeat each information bit three times: encode a one as “111” and a zero as “000.” The valid strings ...more
33%
Flag icon
Set up two groups of people, identical in all relevant ways. Give one group a new treatment (a diet, a drug, etc.), while the other group (called the control group) either gets the old treatment or no special treatment at all. If, after a suitable amount of time, you see a measurable difference between the two supposedly identical groups of people, then the new treatment must be the cause of the difference.
33%
Flag icon
Nowadays we call this a controlled experiment. The principle is simple. To understand the causal effect of the diet, we would like to compare what happens to Daniel on one diet with what would have happened if he had stayed on the other. But we can’t go back in time and rewrite history, so instead we do the next best thing: we compare a group of people who get the treatment with a group of similar people who don’t. It’s obvious, but nevertheless crucial, that the groups be comparable and representative of some population. If these conditions are met, then the results should be transferable to ...more
33%
Flag icon
Daniel also understood that it was important to compare groups. In this respect he was already more sophisticated than many people today, who choose a fad diet (for example) just because a friend went on that diet and lost weight. If you choose a diet based only on one friend’s experience, you are essentially saying that you believe you are similar to your friend in all relevant details: age, heredity, home environment, previous diet, and so forth. That is a lot to assume.
33%
Flag icon
Another key point of Daniel’s experiment is that it was prospective: the groups were chosen in advance. By contrast, suppose that you see twenty people in an infomercial who all say they lost weight on a diet. That seems like a pretty large sample size, so some viewers might consider it convincing evidence. But that would amount to basing their decision on the experience of people who already had a good response. For all you know, for every person who lost weight, ten others just like him or her tried the diet and had no success. But of course, they weren’t chosen to appear on the infomercial.
33%
Flag icon
The term “confounding” originally meant “mixing” in English, and we can understand from the diagram why this name was chosen. The true causal effect X → Y is “mixed” with the spurious correlation between X and Y induced by the fork X ← Z → Y. For example, if we are testing a drug and give it to patients who are younger on average than the people in the control group, then age becomes a confounder—a lurking third variable. If we don’t have any data on the ages, we will not be able to disentangle the true effect from the spurious effect.
34%
Flag icon
If we do have measurements of the third variable, then it is very easy to deconfound the true and spurious effects. For instance, if the confounding variable Z is age, we compare the treatment and control groups in every age group separately. We can then take an average of the effects, weighting each age group according to its percentage in the target population. This method of compensation is familiar to all statisticians; it is called “adjusting for Z” or “controlling for Z.”
36%
Flag icon
The query he wants to pose to the genie Nature is “What is the yield under a uniform application of Fertilizer 1 (versus Fertilizer 2) to the entire field?” Or, in do-operator notation, what is P(yield | do(fertilizer = 1))?
36%
Flag icon
36%
Flag icon
If the farmer performs the experiment naively, for example applying Fertilizer 1 to the high end of his field and Fertilizer 2 to the low end, he is probably introducing Drainage as a confounder. If he uses Fertilizer 1 one year and Fertilizer 2 the next year, he is probably introducing Weather as a confounder. In either case, he will get a biased comparison. The world that the farmer wants to know about is described by Model 2, where all plots receive the same
36%
Flag icon
36%
Flag icon
Now some plots will be subjected to do(fertilizer = 1) and others to do(fertilizer = 2), but the choice of which treatment goes to which plot is random.
36%
Flag icon
36%
Flag icon
The experiment would, however, fail in its objective of simulating Model 2 if either the experimenter were allowed to use his own judgment to choose a fertilizer or the experimental subjects, in this case the plants, “knew” which card they had drawn. This is why clinical trials with human subjects go to great lengths to conceal this information from both the patients and the experimenters (a procedure known as double blinding).
36%
Flag icon
(for instance, in a study of the effect of obesity on heart disease, we cannot randomly assign patients to be obese or not).
36%
Flag icon
Or intervention may be unethical (in a study of the effects of smoking, we can’t ask randomly selected people to smoke for ten years).
37%
Flag icon
Statisticians very often control for proxies when the actual causal variable can’t be measured; for instance, party affiliation might be used as a proxy for political beliefs. Because Z isn’t a perfect measure of M, some of the influence of X on Y might “leak through” if you control for Z. Nevertheless, controlling for Z is still a mistake. While the bias might be less than if you controlled for M, it is still there. For this reason later statisticians, notably David Cox in his textbook The Design of Experiments (1958), warned that you should only control for Z if you have a strong prior ...more
38%
Flag icon
Robins and Greenland set out to express their conception of confounding in terms of potential outcomes. They partitioned the population into four types of individuals: doomed, causative, preventive, and immune. The language is suggestive, so let’s think of the treatment X as a flu vaccination and the outcome Y as coming down with flu. The doomed people are those for whom the vaccine doesn’t work; they will get flu whether they get the vaccine or not. The causative group (which may be nonexistent) includes those for whom the vaccine actually causes the disease. The preventive group consists of ...more
38%
Flag icon
Exchangeability simply means that the percentage of people with each kind of sticker (d percent, c percent, p percent, and i percent, respectively) should be the same in both the treatment and control groups.
38%
Flag icon
Equality among these proportions guarantees that the outcome would be just the same if we switched the treatments and controls.
38%
Flag icon
38%
Flag icon
To put it simply, those stickers on the forehead don’t exist. We do not even have a count of the proportions d, c, p, and i. In fact, this is precisely the kind of information that the genie of Nature keeps locked inside her magic lantern and doesn’t show to anybody.
38%
Flag icon
In a chain junction, A → B → C, controlling for B prevents information about A from getting to C or vice versa.
38%
Flag icon
Likewise, in a fork or confounding junction, A ← B → C, controlling for B prevents information about A from getting to C or vice versa.
38%
Flag icon
Finally, in a collider, A → B ← C, exactly the opposite rules hold. The variables A and C start out independent, so that information about A tells you nothing about C. But if you control for B, then information starts ...
This highlight has been truncated due to consecutive passage length restrictions.
39%
Flag icon
Now, what if we have longer pipes with more junctions, like this: A ← B ← C → D ← E → F → G ← H → I → J?
39%
Flag icon
So we have many options to block communication between A and J: control for B, control for C, don’t control for D (because it’s a collider), control for E, and so forth. Any one of these is sufficient. This is why the usual statistical procedure of controlling for everything that we can measure is so misguided. In fact, this particular path is blocked if we don’t control for anything! The colliders at D and G block the path without any outside help. Controlling for D and G would open this path and enable J to listen to A.
40%
Flag icon
In the late 1950s and early 1960s, statisticians and doctors clashed over one of the highest-profile medical questions of the century: Does smoking cause lung cancer?
40%
Flag icon
Jacob Yerushalmy’s
40%
Flag icon
was one of the last of the pro-tobacco holdouts in academia.
« Prev 1