More on this book
Community
Kindle Notes & Highlights
The back-door criterion tells us which sets of variables we can use to deconfound our data. The adjustment formula actually does the deconfounding. In the simplest case of linear regression, partial regression coefficients perform the back-door adjustment implicitly. In the nonparametric case, we must do the adjustment explicitly, either using the back-door adjustment formula directly on the data or on some extrapolated version of it.
the front door is the direct causal path Smoking → Tar → Cancer, for which we do have data on all three variables. Intuitively, the reasoning is as follows. First, we can estimate the average causal effect of Smoking on Tar, because there is no unblocked back-door path from Smoking to Tar, as the Smoking ← Smoking Gene → Cancer ← Tar path is already blocked by the collider at Cancer. Because it is blocked already, we don’t even need back-door adjustment. We can simply observe P(tar | smoking) and P(tar | no smoking), and the difference between them will be the average causal effect of Smoking
...more
Likewise, the diagram allows us to estimate the average causal effect of Tar on Cancer. To do this we can block the back-door path from Tar to Cancer, Tar ← Smoking ← Smoking Gene → Cancer, by adjusting for Smoking.
Then the back-door adjustment formula will give us P(cancer | do(tar)) and P(cancer | do(no tar)). The difference between these is the average causal effect of Tar on Cancer.
Now we know the average increase in the likelihood of tar deposits due to smoking and the average increase of cancer due to tar deposits. Can we combine these somehow to obtain the average increase in cancer due to smoking?
Cancer can come about in two ways: in the presence of Tar or in the absence of Tar. If we force a person to smoke, then the probabilities of these two states are P(tar | do(smoking)) and P(no tar | do(smoking)), respectively. If a Tar state evolves, the likelihood of causing Cancer is P(cancer | do(tar)). If, on the other hand, a No-Tar state evolves, then it would result in a Cancer likelihood of P(cancer | do(no tar)). We can weight the two scenarios by their respective probabilities under do(smoking) and in this way compute the total probability of cancer due to smoking. The same argument
...more
This highlight has been truncated due to consecutive passage length restrictions.
The process I have just described, expressing P(cancer | do (smoking)) in terms of do-free probabilities, is called the front-door adjustment. It differs from the back-door adjustment in that we adjust for two variables (Smoking and Tar) instead of one, and these variables lie on the front-door path from Smoking to Cancer rather than the back-door path.
the formula (Equation 7.1), which cannot be found in ordinary statistics textbooks. Here X stands for Smoking, Y stands for Cancer, Z stands for Tar, and U (which is conspicuously absent from the formula) stands for the unobservable variable, the Smoking Gene. P(Y | do(X)) = Σz P(Z = z, | X) Σx P(Y | X = x, Z = z) P(X = x)(7.1) Readers with an appetite for mathematics might find it interesting to compare this to
the formula for the back-door adjustment, which looks like Equation 7.2. P(Y | do(X)) = Σz P(...
This highlight has been truncated due to consecutive passage length restrictions.
Equations 7.1 and 7.2 are the most complicated and interesting estimands that I will show you in this book. The left-hand side represents the query “What is the effect of X on Y?” The right-hand side is the estimand, a recipe for answering the query. Note that the estimand contains no do’s, only see’s, represented by the vertical bars, and this means it can be estimated from data.
Could the smoking-cancer controversy have been resolved by one observational study and one causal diagram? If we assume that Figure 7.1 accurately reflects the causal mechanism for cancer, the answer is absolutely yes.
mathematics, given the right situation, can eliminate the effect of confounders even without data on the confounder. And the situation can be clearly recognized. Anytime the causal effect of X on Y is confounded by one set of variables (C) and mediated by another (M) (see Figure 7.2), and, furthermore, the mediating variables are shielded from the effects of C, then you can estimate X’s effect from observational data. Once scientists are made aware of this fact, they should seek shielded mediators whenever they face incurable confounders.
In many cases we could justify the absence of that arrow. For example, if the services were only offered by appointment and people only missed their appointments because of chance events unrelated to Motivation (a bus strike, a sprained ankle, etc.), then we could erase that arrow and use the front-door criterion.
By making certain reasonable assumptions, Glynn and Kashin derived inequalities saying whether the adjustment was likely to be too high or too low and by how much.
Finally, they compared the front-door predictions and back-door predictions to the results from the randomized controlled experiment that was run at the same time. The results were impressive. The estimates from the back-door criterion (controlling for known confounders like Age, Race, and Site) were wildly incorrect, differing from the experimental benchmarks by hundreds or thousands of dollars. This is exactly what you would expect to see if there is an unobserved confounder, such as Motivation. The back-door criterion cannot adjust for it.
On the other hand, the front-door estimates succeeded in removing almost all of the Motivation effect. For males, the front-door estimates were well within the experimental error of the randomized controlled trial, even with the small positive bias that Glynn and Kashin predicted. For females, the results were even better: The front-do...
This highlight has been truncated due to consecutive passage length restrictions.
Glynn and Kashin’s work gives both empirical and methodological proof that as long as the effect of C on M (in Figure 7.2) is weak, front-door adjustment can give a reasonably good estimate of the effect of X on Y. It is much better than not controlling for C.
Glynn and Kashin’s results show why the front-door adjustment is such a powerful tool: it allows us to control for confounders that we cannot observe (like Motivation), including those that we can’t even name. RCTs are considered the “gold standard” of causal effect estimation for exactly the same reason. Because front-door estimates do the same thing, with the additional virtue of observing people’s behavior in their own natural habitat instead of a laboratory, I would not be surprised if this method eventually becomes a useful alternative to randomized controlled trials.
This assertion of irrelevance translates into a symbolic manipulation: P(Y | do(X), Z, W) = P(Y | do(X), Z) The stated equation holds provided that the variable set Z blocks all the paths from W to Y after we have deleted all the arrows leading into X. In the example of Fire → Smoke → Alarm, we have W = Fire, Z = Smoke, Y = Alarm, and Z blocks all the paths from W to Y. (In this case we do not have a variable X.)
if a set Z of variables blocks all back-door paths from X to Y, then conditional on Z, do(X) is equivalent to see(X). We can, therefore, write P(Y | do(X), Z) = P(Y | X, Z)
In other words, we are saying that after we have controlled for a sufficient deconfounding set, any remaining correlation is a genuine causal effect.
we can remove do(X) from P(Y | do(X)) in any case where there are no causal paths from X to Y. That is, P(Y | do(X)) = P(Y) if there is no path from X to Y with only forward-directed arrows.
Rule 1 permits the addition or deletion of observations. Rule 2 permits the replacement of an intervention with an observation, or vice versa. Rule 3 permits the deletion or addition of interventions.
Many logical systems are plagued with intractable decision problems. For instance, given a pile of dominos of various sizes, we have no tractable way to decide if we can arrange them to fill a square of a given size. But once an arrangement is proposed, it takes no time at all to verify whether it constitutes a solution.
Suppose that worst comes to worst, and our causal model does not permit estimation of the causal effect P(Y | do(X)) from observations alone. Perhaps we also cannot conduct a randomized experiment with random assignment of X. A clever researcher might ask whether we might estimate P(Y | do(X)) by randomizing some other variable, say Z, that is more accessible to control than X. For instance, if we want to assess the effect of cholesterol levels (X) on heart disease (Y), we might be able to manipulate the subjects’ diet (Z) instead of exercising direct control over the cholesterol levels in
...more
Even more problems of this sort arise when we consider problems of transportability or external validity—assessing whether an experimental result will still be valid when transported to a different environment that may differ in several key ways from the one studied. This more ambitious set of questions touches on the heart of scientific methodology, for there is no science without generalization. Yet the question of generalization has been lingering for at least two centuries, without an iota of progress. The tools for producing a solution were simply not available.
In 2015, Bareinboim and I presented a paper at the National Academy of Sciences that solves the problem, provided that you can express your assumptions about both environments with a causal diagram.
I could not find a way out of the maze. But as soon as Bareinboim whispered to me, “Try the do-calculus,” the answer came shining through like a baby’s smile. Every step was clear and meaningful. This is now the simplest model known to us in which the causal effect needs to be estimated by a method that goes beyond the front- and back-door adjustments.
Advances in such areas as counterfactuals, generalizability, missing data, and machine learning are still coming up.
Nowadays a John Snow Society even reenacts the removal of the famous pump handle every year.
(1) there is no arrow between Miasma and Water Company (the two are independent), and (2) there is an arrow between Water Company and Water Purity. Left unstated by Snow, but equally important, is a third assumption: (3) the absence of a direct arrow from Water Company to Cholera, which is fairly obvious to us today because we know the water companies were not delivering cholera to their customers by some alternate route. FIGURE 7.8. Diagram for cholera after introduction of an instrumental variable. A variable that satisfies these three properties is today called an instrumental variable.
Clearly Snow thought of this variable as similar to a coin flip, which simulates a variable with no incoming arrows.
since the effect of Water Company on Cholera must go through Water Purity, we conclude (as did Snow) that the observed association between Water Purity and Cholera must also be causal. Snow stated his conclusion in no uncertain terms: if the Southwark and Vauxhall Company had moved its intake point upstream, more than 1,000 lives would have been saved.
the path coefficient a means that an intervention to increase Z by one standard unit will cause X to increase by a standard units.
FIGURE 7.9. General setup for instrumental variables.
Because Z and X are unconfounded, the causal effect of Z on X (that is, a) can be estimated from the slope rXZ of the regression line of X on Z.
Likewise, the variables Z and Y are unconfounded, because the path Z → X ← U → Y is blocked by the collider at X. So the slope of the regression line of Z on Y (rZY) will equal the causal effect on the direct path Z →...
This highlight has been truncated due to consecutive passage length restrictions.
Thus we have two equations: ab = rZY and a = rZX. If we divide the first equation by the second, we get the causa...
This highlight has been truncated due to consecutive passage length restrictions.
He was interested in predicting how the output of a commodity would change if a tariff were imposed, which would raise the price and therefore, in theory, encourage production. In economic terms, he wanted to know the elasticity of supply.
Philip Wright deliberately introduced the variable Yield per Acre (of flaxseed) as an instrument that directly affects supply but has no correlation to demand. He then used an analysis like the one I just gave to deduce both the effect of supply on price and the effect of price on supply.
Historians quarrel about who invented instrumental variables, a method that became extremely popular in modern econometrics. There is no question in my mind that Philip Wright borrowed the idea of path coefficients from his son. No economist had ever before insisted on the distinction between causal coefficients and regression coefficients; they were all in the Karl Pearson–Henry Niles camp that causation is nothing more than a limiting case of correlation.
This trial, like many RCTs, faced the problem of noncompliance, when subjects randomized to receive a drug don’t actually take it. This will reduce the apparent effectiveness of the drug, so we may want to adjust the results to account for the noncompliers. But as always, confounding rears its ugly head. If the noncompliers are different from the compliers in some relevant way (maybe they are sicker to start with?), we cannot predict how they would have responded had they adhered to instructions.
in this case our variables are binary, not numerical. This means right away that we cannot use a linear model, and therefore we cannot apply the instrumental variables formula that we derived earlier. However, in such cases we can often replace the linearity assumption with a weaker condition called monotonicity, which I’ll explain below.
First, is the instrumental variable Z independent of the confounder? The randomization of Z ensures that the answer is yes. (As we saw in Chapter 4, randomization is a great way to make sure that a variable isn’t affected by any confounders.) Is there any direct path from Z to Y? Common sense says that there is no way that receiving a particular random number (Z) would affect cholesterol (Y), so the answer is no. Finally, is there a strong association between Z and X? This time the data themselves should be consulted, and the answer again is yes. We must always ask the above three questions
...more
we can estimate the effect of the treatment. First let’s take the worst-case scenario: none of the noncompliers would have improved if they had complied with treatment. In that case, the only people who would have taken the drug and improved would be the 47.3 percent who actually did comply and improve. But we need to correct this estimate for the placebo effect, which is in the third row of the table. Out of the people who were assigned the placebo and took the placebo, 8.1 percent improved. So the net improvement above and beyond the placebo effect is 47.3 percent minus 8.1 percent, or 39.2
...more