The Book of Why: The New Science of Cause and Effect (Penguin Science)
Rate it:
41%
Flag icon
As this chapter shows, one of the most important scientific arguments against the smoking-cancer hypothesis was the possible existence of unmeasured factors that cause both craving for nicotine and lung cancer.
41%
Flag icon
Lind had discovered that citrus fruits could prevent scurvy, and in the mid-1800s, John Snow had figured out that water contaminated with fecal matter caused cholera. (Later research identified a more specific causative agent in each case: vitamin C deficiency for scurvy, the cholera bacillus for cholera.)
41%
Flag icon
had in common a fortunate one-to-one relation between cause and effect.
41%
Flag icon
The smoking-cancer debate challenged this monolithic concept of causation. Many people smoke their whole lives and never get lung cancer. Conversely, some people get lung cancer without ever lighting up a cigarette. Some people may get it because of a hereditary disposition, others because of exposure to carcinogens, and some for both reasons.
41%
Flag icon
Of course, statisticians already knew of one excellent way to establish causation in a more general sense: the randomized controlled trial (RCT). But such a study would be neither feasible nor ethical in the case of smoking. How could you assign people chosen at random to smoke for decades, possibly ruining their health, just to see if they would get lung cancer after thirty years? It’s impossible to imagine anyone outside North Korea “volunteering” for such a study.
41%
Flag icon
Without a randomized controlled trial, there was no way to convince skeptics like Yerushalmy and R. A. Fisher, who were committed to the idea that the observed associatio...
This highlight has been truncated due to consecutive passage length restrictions.
41%
Flag icon
For example, there could be a smoking gene that caused people to crave cigarettes and also, at the same time, made them more likely to develop lung cancer (p...
This highlight has been truncated due to consecutive passage length restrictions.
41%
Flag icon
The US surgeon general’s report, in 1964, stated in no uncertain terms, “Cigarette smoking is causally related to lung cancer in men.” This blunt statement forever shut down the argument that smoking was “not proven” to cause cancer. The rate of smoking in the United States among men began to decrease the following year and is now less than half what it was in 1964. No doubt millions of lives have been saved and lifespans lengthened.
41%
Flag icon
In 1902, cigarettes comprised only 2 percent of the US tobacco market; spittoons rather than ashtrays were the most ubiquitous symbol of tobacco consumption.
41%
Flag icon
automation and advertising. Machine-made cigarettes easily outcompeted handcrafted cigars and pipes on the basis of availability and cost. Meanwhile, the tobacco industry invented and perfected many tricks of the trade of advertising (see Figure 5.1). People who watched TV in the 1960s can easily remember any number of catchy cigarette jingles, from “You get a lot to like in a Marlboro” to “You’ve come a long way, baby.” By 1952, cigarettes’ share of the tobacco market had rocketed from 2 to 81 percent, and the market itself had grown dramatically.
41%
Flag icon
This sea change in the habits of a country had unexpected ramifications for public health. Even in the early years of the twentieth century, there had been suspicions that smoking was unhealthy, that it “irritated” the throat and caused coughing. Around mid-century, the evidence started to become a good deal more ominous. Before cigarettes, lung cancer had been so rare that a doctor might encounter it only once in a lifetime of practice. But between 1900 and 1950, the formerly rare disease quadrupled in frequency, and by 1960 it...
This highlight has been truncated due to consecutive passage length restrictions.
41%
Flag icon
With hindsight, it is easy to point the finger of blame at smoking. If we plot the rates of lung cancer and tobacco consumption on a graph (see Figure 5.2), the connection is impossible to miss. But time series data are poor evidence for causality. Many other things had changed between 1900 and 1950 and were equally plausible culprits: the paving of roads, the inhalation of leaded gasoline fumes, and air pollution in general.
42%
Flag icon
The type of study Doll and Hill conducted is now called a case-control study because it compares “cases” (people with a disease) to controls. It is clearly an improvement over time series data, because researchers can control for confounders like age, sex, and exposure to environmental pollutants. Nevertheless, the case-control design has some obvious drawbacks. It is retrospective; that means we study people known to have cancer and look backward to discover why. The probability logic is backward too. The data tell us the probability that a cancer patient is a smoker instead of the ...more
42%
Flag icon
Doll and Hill realized that if there were hidden biases in the case-control studies, mere replication would not overcome them. Thus, in 1951 they began a prospective study, for which they sent out questionnaires to 60,000 British physicians about their smoking habits and followed them forward in time. (The American Cancer Society launched a similar and larger study around the same time.) Even in just five years, some dramatic differences emerged. Heavy smokers had a death rate from lung cancer twenty-four times that of nonsmokers. In the American Cancer Society study, the results were even ...more
42%
Flag icon
Cornfield took direct aim at Fisher’s constitutional hypothesis, and he did so on Fisher’s own turf: mathematics. Suppose, he argued, that there is a confounding factor, such as a smoking gene, that completely accounts for the cancer risk of smokers. If smokers have nine times the risk of developing lung cancer, the confounding factor needs to be at least nine times more common in smokers to explain the difference in risk. Think of what this means. If 11 percent of nonsmokers have the “smoking gene,” then 99 percent of the smokers would have to have it. And if even 12 percent of nonsmokers ...more
43%
Flag icon
fact, Cornfield’s method planted the seeds of a very powerful technique called “sensitivity analysis,”
43%
Flag icon
Epidemiologists in the 1950s faced the criticism that their evidence was “only statistical.” There was allegedly no “laboratory proof.” But even a look at history shows that this argument was specious. If the standard of “laboratory proof” had been applied to scurvy, then sailors would have continued dying right up until the 1930s, because until the discovery of vitamin C, there was no “laboratory proof” that citrus fruits prevented scurvy.
43%
Flag icon
By the end of the decade, the accumulation of so many different kinds of evidence had convinced almost all experts in the field that smoking indeed caused cancer. Remarkably, even researchers at the tobacco companies were convinced—a fact that stayed deeply hidden until the 1990s, when litigation and whistle-blowers forced tobacco companies to release many thousands of previously secret documents. In 1953, for example, a chemist at R.J. Reynolds, Claude Teague, had written to the company’s upper management that tobacco was “an important etiologic factor in the induction of primary cancer of ...more
43%
Flag icon
In a speech given in March 1954, George Weissman, vice president of Philip Morris and Company, said, “If we had any thought or knowledge that in any way we were selling a product harmful to consumers, we would stop business tomorrow.” Sixty years later, we are still waiting for Philip Morris to keep that promise.
43%
Flag icon
This brings us to the saddest episode in the whole smoking-cancer controversy: the deliberate efforts of the tobacco companies to deceive the public about the health risks. If Nature is like a genie that answers a question truthfully but only exactly as it is asked, imagine how much more difficult it is for scientists to face an adversary that intends to deceive us. The cigarette wars were science’s first confrontation with organized denialism, and no one was prepared. The tobacco companies magnified any shred of scientific controversy they could. They set up their own Tobacco Industry ...more
43%
Flag icon
For all these reasons, the link between smoking and cancer remained controversial in the public mind long after it had ended among epidemiologists. Even doctors, who should have been more attuned to the science, remained unconvinced: a poll conducted by the American Cancer Society in 1960 showed that only a third of American doctors agreed with the statement that smoking was “a major cause of lung cancer,” and 43 percent of doctors were themselves smokers.
44%
Flag icon
Viewed from the perspective of public health, the report of the advisory committee was a landmark. Within two years, Congress had required manufacturers to place health warnings on all cigarette packs. In 1971, cigarette advertisements were banned from radio and television. The percentage of US adults who smoke declined from its all-time maximum of 45 percent in 1965 to 19.3 percent in 2010. The antismoking campaign has been one of the largest and most successful, though painfully slow and incomplete, public health interventions in history.
48%
Flag icon
Try this experiment: Flip two coins simultaneously one hundred times and write down the results only when at least one of them comes up heads. Looking at your table, which will probably contain roughly seventy-five entries, you will see that the outcomes of the two simultaneous coin flips are not independent. Every time Coin 1 landed tails, Coin 2 landed heads. How is this possible? Did the coins somehow communicate with each other at light speed? Of course not. In reality you conditioned on a collider by censoring all the tails-tails outcomes.
48%
Flag icon
Our simple coin-flip experiment proves that Reichenbach’s dictum was too strong, because it neglects to account for the process by which observations are selected. There was no common cause of the outcome of the two coins, and neither coin communicated its result to the other. Nevertheless, the outcomes on our list were correlated. Reichenbach’s error was his failure to consider collider structures—the structure behind the data selection. The mistake was particularly illuminating because it pinpoints the exact flaw in the wiring of our brains. We live our lives as if the common cause principle ...more
48%
Flag icon
The distorting prism of colliders is just as prevalent in everyday life. As Jordan Ellenberg asks in How Not to Be Wrong, have you ever noticed that, among the people you date, the attractive ones tend to be jerks? Instead of constructing elaborate psychosocial theories, consider a simpler explanation. Your choice of people to date depends on two factors: attractiveness and personality. You’ll take a chance on dating a mean attractive person or a nice unattractive person, and certainly a nice attractive person, but not a mean unattractive person. It’s the same as the two-coin example, when you ...more
49%
Flag icon
Simpson’s reversal can be found in real-world data sets. For baseball fans, here is a lovely example concerning two star baseball players, David Justice and Derek Jeter.
49%
Flag icon
How can one player be a worse hitter than the other in 1995, 1996, and 1997 but better over the three-year period?
49%
Flag icon
In fact it isn’t possible; the problem is that we have used an overly simple word (“better”) to describe a complex averaging process over uneven seasons.
49%
Flag icon
6.5. Data (not fictitious) illustrating Simpson’s reversal.
51%
Flag icon
Consider a study that measures weekly exercise and cholesterol levels in various age groups. When we plot hours of exercise on the x-axis and cholesterol on the y-axis, as in Figure 6.6(a), we see in each age group a downward trend, indicating perhaps that exercise reduces cholesterol. On the other hand, if we use the same scatter plot but don’t segregate the data by age, as in Figure 6.6(b), then we see a pronounced upward trend, indicating that the more people exercise, the higher their cholesterol becomes. Once again we seem to have a BBG drug situation, where Exercise is the drug: it seems ...more
51%
Flag icon
To decide whether Exercise is beneficial or harmful, as always, we need to consult the story behind the data. The data show that older people in our population exercise more. Because it seems more likely that Age causes Exercise rather than vice versa, and since Age may have a causal effect on Cholesterol, we conclude that Age may be a confounder of Exercise and Cholesterol. So we should control for Age. In other words, we should look at the age-segregated data and conclude that exercise is beneficial, regardless of age.
53%
Flag icon
The analogue of a regression line is a regression plane, which has an equation that looks like Y = aX + bZ + c. We can easily compute a, b, c from the data. Here something wonderful happens, which Galton did not realize but Karl Pearson and George Udny Yule certainly did. The coefficient a gives us the regression coefficient of Y on X already adjusted for Z. (It is called a partial regression coefficient and written rYX.Z.)
54%
Flag icon
Suppose that researchers had measured the tar deposits in smokers’ lungs. Even in the 1950s, the formation of tar deposits was suspected as one of the possible intermediate stages in the development of lung cancer.
54%
Flag icon
In this way, mathematics does for us what ten years of debate and congressional testimony could not: quantify the causal effect of smoking on cancer—provided our assumptions hold, of course.
55%
Flag icon
FIGURE 7.2. The basic setup for the front-door criterion. Glynn and Kashin did not draw a causal diagram, but from their description of the study, I would draw it as shown in Figure 7.3.
55%
Flag icon
55%
Flag icon
we can see that the front-door criterion would apply if there were no arrow from Motivation to Showed Up, the “shielding” I mentioned earlier. In many cases we could justify the absence of that arrow. For example, if the services were only offered by appointment and people only missed their appointments because of chance events unrelated to Motivation (a bus strike, a sprained ankle, etc.), then we could erase that arrow and use the front-door criterion. Under the actual circumstances of the study, where the services were available all the time, such an argument is hard to make. However—and ...more
56%
Flag icon
The estimates from the back-door criterion (controlling for known confounders like Age, Race, and Site) were wildly incorrect, differing from the experimental benchmarks by hundreds or thousands of dollars.
56%
Flag icon
This is exactly what you would expect to see if there is an unobserved confounder, such as Motivation. The back-door criterion cannot adjust for it.
56%
Flag icon
On the other hand, the front-door estimates succeeded in removing almost all of the Motivation effect. For males, the front-door estimates were well within the experimental error of the randomized controlled trial, even with the small positive bias that Glynn and Kashin predicted. For females, the results were even better: The front-door estimates matched the experimental benchmark almost perfectly, with no apparent bias. Glynn and Kashin’s work gives both empirical and methodological proof that as long as the effect of C on M (in Figure 7.2) is weak, front-door adjustment can give a ...more
This highlight has been truncated due to consecutive passage length restrictions.
56%
Flag icon
RCTs are considered the “gold standard” of causal effect estimation for exactly the same reason. Because front-door estimates do the same thing, with the additional virtue of observing people’s behavior in their own natural habitat instead of a laboratory, I would not be surprised if this me...
This highlight has been truncated due to consecutive passage length restrictions.
57%
Flag icon
67%
Flag icon
= $65,000 + 2,500 × EX + 5,000 × ED + U
67%
Flag icon
EX = 10 – 4 × ED + UEX(8.3) This equation says that the average
69%
Flag icon
borrowed the idea for this example from an article in Harvard Law Review where the story was essentially the same as in Figure 8.3 and the author did use matching.)
« Prev 1 2 Next »