The Book of Why: The New Science of Cause and Effect (Penguin Science)
Rate it:
12%
Flag icon
Likewise, the knowledge conveyed in a causal diagram is typically much more robust than that encoded in a probability distribution. For example, suppose that times have changed and a much safer and more effective vaccine is introduced. Suppose, further, that due to improved hygiene and socioeconomic conditions, the danger of contracting smallpox has diminished. These changes will drastically affect all the probabilities involved; yet, remarkably, the structure of the diagram will remain invariant. This is the key secret of causal modeling.
12%
Flag icon
once we go through the analysis and find how to estimate the benefit of vaccination from data, we do not have to repeat the entire analysis from scratch. As discussed in the Introduction, the same estimand (i.e., recipe for answering the query) will remain valid and, as long as the diagram does not change, can be applied to the new data and produce a new estimate for our query. It is because of thi...
This highlight has been truncated due to consecutive passage length restrictions.
12%
Flag icon
philosophers have tried to define causation in terms of probability, using the notion of “probability raising”: X causes Y if X raises the probability of Y. This concept is solidly ensconced in intuition. We say, for example, “Reckless driving causes accidents” or “You will fail this course because of your laziness,” knowing quite well that the antecedents merely tend to make the consequences more likely, not absolutely
12%
Flag icon
“X raises the probability of Y” using conditional probabilities and wrote P(Y | X) > P(Y). This interpretation is wrong, as you surely noticed, because “raises” is a causal concept, connoting a causal influence of X over Y. The expression P(Y | X) > P(Y), on the other hand, speaks only about observations and means: “If we see X, then the probability of Y increases.”
12%
Flag icon
Any attempt to “define” causation in terms of seemingly simpler, first-rung concepts must fail. That is why I have not attempted to define causation anywhere in this book: definitions demand reduction, and reduction demands going to a lower rung. Instead, I have pursued the ultimately more constructive program of explaining how to answer causal queries and what information is needed to answer them.
13%
Flag icon
Philosophers tried hard to repair the definition by conditioning on what they called “background factors” (another word for confounders), yielding the criterion P(Y | X, K = k) > P(Y | K = k), where K stands for some background variables. In fact, this criterion works for our ice-cream example if we treat temperature as a background variable. For example, if we look only at days when the temperature is ninety degrees (K = 90), we will find no residual association between ice-cream sales and crime. It’s only when we compare ninety-degree days to thirty-degree days that we get the illusion of a ...more
13%
Flag icon
we should condition on any factor that is “causally relevant” to the effect. By borrowing a concept from rung two of the Ladder of Causation, she essentially gave up on the idea of defining causes based on probability alone.
13%
Flag icon
The proper way to rescue the probability-raising idea is with the do-operator: we can say that X causes Y if P(Y | do(X)) > P(Y).
13%
Flag icon
idea—philosophers were too quick to commit to the only uncertainty-handling language they knew, the language of probability. They have for the most part gotten over this blunder in the past decade or so, but unfortunately similar ideas are being pursued in econometrics even now, under names like “Granger causality” and “vector autoregression.”
13%
Flag icon
Bayesian networks, that mimics how an idealized, decentralized brain might incorporate probabilities into its decisions. Given that we see certain facts, Bayesian networks can swiftly compute the likelihood that certain other facts are true or false. Not surprisingly, Bayesian networks caught on immediately in the AI community and even today are considered a leading paradigm in artificial intelligence for reasoning under uncertainty.
13%
Flag icon
Bayesian networks inhabit a world where all questions are reducible to probabilities, or (in the terminology of this chapter) degrees of association between variables; they could not ascend to the second or third rungs of the Ladder of Causation. Fortunately, they required only two slight twists to climb to the top. First, in 1991, the graph-surgery idea empowered them to handle both observations and interventions. Another twist, in 1994, brought them to the third level and made them capable of handling counterfactuals.
13%
Flag icon
while probabilities encode our beliefs about a static world, causality tells us whether and how probabilities change when the world changes, be it by intervention or by act of imagination.
14%
Flag icon
He painstakingly compiled pedigrees of 605 “eminent” Englishmen from the preceding four centuries. But he found that the sons and fathers of these eminent men were somewhat less eminent and the grandparents and grandchildren less eminent still.
14%
Flag icon
Sons of tall men tend to be taller than average—but not as tall as their fathers. Sons of short men tend to be shorter than average—but not as short as their fathers.
14%
Flag icon
students take two different standardized tests on the same material, the ones who scored high on the first test will usually score higher than average on the second test but not as high as they did the first time.
15%
Flag icon
“Success = talent + luck. Great success = a little more talent + a lot of luck.”
15%
Flag icon
Tall men usually had longer-than-average forearms—but not as far above average as their height. Clearly height is not a cause of forearm length, or vice versa. If anything, both are caused by genetic inheritance.
15%
Flag icon
Later he realized an even more startling fact: in generational comparisons, the temporal order could be reversed. That is, the fathers of sons also revert to the mean. The father of a son who is taller than average is likely to be taller than average but shorter than his son (see Figure 2.2). Once Galton realized this, he had to give up any idea of a causal explanation for regression, because there is no way that the sons’ heights could cause the fathers’ heights.
15%
Flag icon
where regression to the mean is concerned, there is no difference between cause and effect.
16%
Flag icon
The second model, on the other hand, shows that to explain the stability of success from one generation to the next, we only need explain the stability of the genetic endowment of the population (talent). That stability, now called the Hardy-Weinberg equilibrium, received a satisfactory mathematical explanation in the work of G. H. Hardy and Wilhelm Weinberg in 1908. And yes, they used yet another causal model—the Mendelian theory of inheritance.
17%
Flag icon
Pearson belonged to a philosophical school called positivism, which holds that the universe is a product of human thought and that science is only a description of those thoughts. Thus causation, construed as an objective process that happens in the world outside the human brain, could not have any scientific meaning.
17%
Flag icon
Pearson, arguably one of England’s first feminists, started the Men’s and Women’s Club in London for discussions of “the woman question.” He was concerned about women’s subordinate position in society and advocated for them to be paid for their work.
18%
Flag icon
Pearson discovered possibly the most interesting kind of “spurious correlation” as early as 1899. It arises when two heterogeneous populations are aggregated into one. Pearson, who, like Galton, was a fanatical collector of data on the human body, had obtained measurements of 806 male skulls and 340 female skulls from the Paris Catacombs (Figure 2.5). He computed the correlation between skull length and skull breadth. When the computation was done only for males or only for females, the correlations were negligible—there was no significant association between skull length and breadth. But when ...more
18%
Flag icon
Mendel’s theory of dominant and recessive genes had just been rediscovered.
18%
Flag icon
It proved virtually impossible to breed an all-white or all-colored guinea pig, and even the most inbred families (after multiple generations of brother-sister mating) still had pronounced variation, from mostly white to mostly colored. This contradicted the prediction of Mendelian genetics that a particular trait should become “fixed” by multiple generations of inbreeding.
19%
Flag icon
Having built this bridge, Wright could travel backward over it, from the correlations measured in the data (rung one) to the hidden causal quantities, d and h (rung two).
19%
Flag icon
FIGURE 2.7. Sewall Wright’s first path diagram, illustrating the factors leading to coat color in guinea pigs. D = developmental factors (after conception, before birth), E = environmental factors (after birth), G = genetic factors from each individual parent, H = combined hereditary factors from both parents, O, Oʹ = offspring. The objective of analysis was to estimate the strength of the effects of D, E, H (written as d, e, h in the diagram). (Source: Sewall Wright, Proceedings of the National Academy of Sciences [1920], 320–332.)
19%
Flag icon
In a randomly bred population of guinea pigs, 42 percent of the variation in coat pattern was due to heredity, and 58 percent was developmental. By contrast, in a highly inbred family, only 3 percent of the variation in white fur coverage was due to heredity, and 92 percent was developmental. In other words, twenty generations of inbreeding had all but eliminated the genetic variation, but the developmental factors remained.
19%
Flag icon
The two offspring share environmental factors but have different developmental histories.
19%
Flag icon
In an inbred family there would be a strong correlation between the heredity of the sire and the dam, which Wright indicated with a two-headed arrow between Hʺ and Hʹʹʹ.
20%
Flag icon
we can use the diagram in exploratory mode; we can postulate certain causal relationships and work out the predicted correlations between variables. If these contradict the data, then we have evidence that the relationships we assumed were false. This way of using path diagrams, rediscovered in 1953 by Herbert Simon (a 1978 Nobel laureate in economics), inspired much work in the social sciences.
20%
Flag icon
Many people still make Niles’s mistake of thinking that the goal of causal analysis is to prove that X is a cause of Y or else to find the cause of Y from scratch. That is the problem of causal discovery, which was my ambitious dream when I first plunged into graphical modeling and is still an area of vigorous research. In contrast, the focus of Wright’s research, as well as this book, is representing plausible causal knowledge in some mathematical language, combining it with empirical data, and answering causal queries that are of practical value.
20%
Flag icon
in his “Correlation and Causation” paper, from 1921, which asks how much a guinea pig’s birth weight will be affected if it spends one more day in the womb.
21%
Flag icon
Wright noted that the guinea pigs that spent a day longer in the womb weighed an average of 5.66 grams more at birth. So, one might naively suppose that a guinea pig embryo grows at 5.66 grams per day just before it is born.
21%
Flag icon
The pups born later are usually born later for a reason: they have fewer litter mates. This means that they have had a more favorable environment for growth throughout the pregnancy. A pup with only two siblings, for instance, will already weigh more on day sixty-six than a pup with four siblings. Thus the difference in birth weights has two causes, and we want to disentangle them. How much of the 5.66 grams is due to spending an additional day in utero and how much is due to having fewer siblings to compete with?
21%
Flag icon
L represents litter size, which affects both P and Q (a larger litter causes the pup to grow slower and also have fewer days in utero).
21%
Flag icon
X, P, and L can be measured, for each guinea pig, but Q cannot.
21%
Flag icon
“What is the direct effect of the gestation period P on the birth weight X?”
21%
Flag icon
In Figure 2.8, the direct effect is represented by the path coefficient p, corresponding to the path P → X. The bias due to litter size corresponds to the path P ← L → Q → X. And now the algebraic magic: the amount of bias is equal to the product of the path coefficients along that path (in other words, l times lʹ times q). The total correlation, then, is just the sum of the path coefficients along the two paths: algebraically, p + (l × lʹ × q) = 5.66 grams per day.
21%
Flag icon
the ingenuity of path coefficients really shines. Wright’s methods tell us how to express each of the measured correlations in terms of the path coefficients. After doing this for each of the measured pairs (P, X), (L, X), and (L, P), we obtain three equations that can be solved algebraically for the unknown path coefficients, p, lʹ, and l × q.
21%
Flag icon
Then we are done, because the desired quantity p has been obtained.
21%
Flag icon
By contrast, the number 5.66 grams per day has no biological significance, because it conflates two separate processes, one of which is not causal but anticausal (or diagnostic) in the link P ← L. Lesson one from this example: causal analysis allows us to quantify processes in the real world, not just patterns in the data.
21%
Flag icon
Lesson two, whether you followed the mathematics or not: in path analysis you draw conclusions about individual causal relationships by examining the diagram as a whole. The entire structure of the diagram may be needed to estimate each individual parameter.
21%
Flag icon
Scientists will always prefer routine calculations on data to methods that challenge their scientific knowledge.
21%
Flag icon
“Statistics may be regarded as … the study of methods of the reduction of data.” Pay attention to the words “methods,” “reduction,” and “data.” Wright abhorred the idea of statistics as merely a collection of methods; Fisher embraced it.
21%
Flag icon
in causal analysis we must incorporate some understanding of the process that produces the data, and then we get something that was not in the data to begin with.
22%
Flag icon
Sociologists renamed path analysis as structural equation modeling (SEM), embraced diagrams, and used them extensively until 1970, when a computer package called LISREL automated the calculation of path coefficients
22%
Flag icon
In economics, the algebraic part of path analysis became known as simultaneous equation models (no acronym). Economists essentially never used path diagrams and continue not to use them to this day, relying instead on numerical equations and matrix algebra. A dire consequence of this is that, because algebraic equations are nondirectional (that is, x = y is the same as y = x), economists had no notational means to distinguish causal from regression equations and thus were unable to answer policy-related questions, even after solving the equations.
22%
Flag icon
For Wright, drawing a path diagram is not a statistical exercise; it is an exercise in genetics, economics, psychology, or whatever the scientist’s own field of expertise is.
22%
Flag icon
Wright traces the allure of “model-free” methods to their objectivity. This has indeed been a holy grail for statisticians since day one—or since March 15, 1834, when the Statistical Society of London was founded. Its founding charter said that data were to receive priority in all cases over opinions and interpretations. Data are objective; opinions are subjective. This paradigm long predates Pearson. The struggle for objectivity—the idea of reasoning exclusively from data and experiment—has been part of the way that science has defined itself ever since Galileo.