The Book of Why: The New Science of Cause and Effect (Penguin Science)
Rate it:
Open Preview
2%
Flag icon
Only after gamblers invented intricate games of chance, sometimes carefully designed to trick us into making bad choices, did mathematicians like Blaise Pascal (1654), Pierre de Fermat (1654), and Christiaan Huygens (1657) find it necessary to develop what we today call probability theory. Likewise, only when insurance organizations demanded accurate estimates of life annuity did mathematicians like Edmond Halley (1693) and Abraham de Moivre (1725) begin looking at mortality tables to calculate life expectancies.
Mohamed liked this
3%
Flag icon
First, in the world of AI, you do not really understand a topic until you can teach it to a mechanical robot.
6%
Flag icon
Millions of lives were lost or shortened because scientists did not have an adequate language or methodology for answering causal questions.
6%
Flag icon
“We think of a cause as something that makes a difference, and the difference it makes must be a difference from what would have happened without it.”
Satyajeet liked this
6%
Flag icon
Of special interest are questions concerning necessary and sufficient causes of observed events. For example, how likely is it that the defendant’s action was a necessary cause of the claimant’s injury? How likely is it that man-made climate change is a sufficient cause of a heat wave?
8%
Flag icon
Why not just go into our vast database of previous purchases and see what happened previously when toothpaste cost twice as much? The reason is that on the previous occasions, the price may have been higher for different reasons. For example, the product may have been in short supply, and every other store also had to raise its price. But now you are considering a deliberate intervention that will set a new price regardless of market conditions.
Alexander Telfar
Given enough time, chances are that someone else will have tried doubling the price. You can just observe. (this assumes perfect information, or that hidden variables to not effect what you are about) What interventions give us is data efficiency. Where observation required LOTS of data, intervention needs (possibly) a single action. Want to formulate this! Under various types of model/noise what is the data efficiency of interventions versus observation!?
9%
Flag icon
The rewards of having a causal model that can answer counterfactual questions are immense.
24%
Flag icon
Why was the forward probability (of x given L) so much easier to assess mentally than the probability of L given x? In this example, the asymmetry comes from the fact that L acts as the cause and x is the effect. If we observe a cause—for example, Bobby throws a ball toward a window—most of us can predict the effect (the ball will probably break the window).
Alexander Telfar
like surjective functions (many-to-one). an output has many possible mappings to it, but each input maps to the same output.
32%
Flag icon
For example, we cannot distinguish the fork A ← B → C from the chain A → B → C by data alone, because the two diagrams imply the same independence conditions.
Alexander Telfar
but i thought that if we controlled for B then A would be linked to C in the fork, while A, C would be independent in the chain?
33%
Flag icon
Daniel also understood that it was important to compare groups. In this respect he was already more sophisticated than many people today, who choose a fad diet (for example) just because a friend went on that diet and lost weight. If you choose a diet based only on one friend’s experience, you are essentially saying that you believe you are similar to your friend in all relevant details: age, heredity, home environment, previous diet, and so forth. That is a lot to assume.
33%
Flag icon
Sometimes you end up controlling for the thing you’re trying to measure.”
37%
Flag icon
It is known as a mediator: it is the variable that explains the causal effect of X on Y. It is a disaster to control for Z if you are trying to find the causal effect of X on Y.
37%
Flag icon
Statisticians very often control for proxies when the actual causal variable can’t be measured; for instance, party affiliation might be used as a proxy for political beliefs. Because Z isn’t a perfect measure of M, some of the influence of X on Y might “leak through” if you control for Z.
41%
Flag icon
The cholera bacillus is the only cause of cholera; or as we would say today, it is both necessary and sufficient.
42%
Flag icon
Doctors call it a “dose-response effect”: if substance A causes a biological effect B, then usually (though not always) a larger dose of A causes a stronger response B
43%
Flag icon
health.” In a speech given in March 1954, George Weissman, vice president of Philip Morris and Company, said, “If we had any thought or knowledge that in any way we were selling a product harmful to consumers, we would stop business tomorrow.” Sixty years later, we are still waiting for Philip Morris to keep that promise.
47%
Flag icon
Our brains are not prepared to accept causeless correlations, and we need special training—through examples like the Monty Hall paradox
52%
Flag icon
This last comment brings up one more curious point about Lord’s paradox as originally phrased. Although the stated intention of the school dietitian is to “determine the effects of the diet,” nowhere in his original paper does Lord mention a control diet. Therefore we can’t even say anything about the diet’s effects.
53%
Flag icon
To do this, we measure the average causal effect of an intervention by first estimating its effect at each “level,” or stratum, of the deconfounder. We then compute a weighted average of those strata, where each stratum is weighted according to its prevalence in the population. If, for example, the deconfounder is gender, we first estimate the causal effect for males and females. Then we average the two, if the population is (as usual) half male and half female. If the proportions are different—say, two-thirds male and one-third female—then to estimate the average causal effect we would take a ...more
53%
Flag icon
You can easily extend the procedure to deal with multiple variables as well.
Alexander Telfar
no, the curse of dimensionality still exists. and some parameters will be harder to fit!? increasing the number of dims requires more data to make fitting possible!?
55%
Flag icon
fact, one of the major accomplishments of causal diagrams is to make the assumptions transparent so that they can be discussed and debated by experts
56%
Flag icon
Now let us return to our central question of when a model can replace an experiment, or when a “do” quantity can be reduced to a “see” quantity.
56%
Flag icon
We start with our target sentence, P(Y | do(X)). Our task will be complete if we can succeed in eliminating the do-operator from it, leaving only classical probability expressions, like P(Y | X) or P(Y | X, Z, W). We cannot, of course, manipulate our target expression at will; the operations must conform to what do(X) means as a physical intervention.
56%
Flag icon
We know that if a set Z of variables blocks all back-door paths from X to Y, then conditional on Z, do(X) is equivalent to see(X). We can, therefore, write P(Y | do(X), Z) = P(Y | X, Z)
61%
Flag icon
Unlike most diagrams in this book, this one has “two-way” arrows, but I would ask the reader not to lose too much sleep over it. With some mathematical trickery we could equally well replace the Demand → Price → Supply chain with a single arrow Demand → Supply, and the figure would then look like Figure 7.9 (though it would be less acceptable to economists).
Alexander Telfar
uhh. no. !?!?
63%
Flag icon
For decades or even centuries, lawyers have used a relatively straightforward test of a defendant’s culpability called “but-for causation”: the injury would not have occurred but for the defendant’s action.
63%
Flag icon
It will be helpful to distinguish three different kinds of causation: necessary causation, sufficient causation, and necessary-and-sufficient causation.
64%
Flag icon
Lewis argued that we evaluate counterfactuals by comparing our world, where he did not take aspirin, to the most similar world in which he did take an aspirin.
64%
Flag icon
“Most similar” is key. There may be some “possible worlds” in which his headache did not go away—for example, a world in which he took the aspirin and then bumped his head on the bathroom door. But that world contains an extra, adventitious circumstance.
Alexander Telfar
so this ability really depends on the ability to disentangle various phemonema!?!? this seems important!?! if headache/bang head were not separate concepts then we would not be able to do this!?
65%
Flag icon
We can describe this as making the minimal alteration to a causal diagram needed to ensure that X equals x. In this respect, structural counterfactuals are compatible with Lewis’s idea of the most similar possible world.
Alexander Telfar
minimal alteration defined how!?
65%
Flag icon
Second, they were modeled on Bayesian networks, which in turn were modeled on David Rumelhart’s description of message passing in the brain.
Alexander Telfar
huh, did not know that
65%
Flag icon
credit and blame, responsibility and regret. These are all counterfactual notions
Alexander Telfar
!!! - so credit assignment is given by the counterfactual of asking what if I had predicted the target!?
66%
Flag icon
They are data driven, not model driven.
Alexander Telfar
huh, that is the first time I have seen "data driven" being used with a negative connotation
67%
Flag icon
But we need to allow for individual variations, so we extend this function to read S = fS(EX, ED, US), where US stands for “unobserved variables that affect salary.”
Alexander Telfar
uhh. so we just alter the causal graph... adding noise to replace hidden variables makes all sorts of assumptions!?! - those variables have "little effect" on EX, ED and S. - they are not biased in any way - ?
68%
Flag icon
Of course, there is no such thing as a free lunch: we got these strong results because we made strong assumptions.
68%
Flag icon
“consistency.” It says that a person who took aspirin and recovered would also recover if given aspirin by experimental design.
Alexander Telfar
hmm...
69%
Flag icon
In a probabilistic Bayesian network, the arrows into Y mean that the probability of Y is governed by the conditional probability tables for Y, given observations of its parent variables. The same is true for causal Bayesian networks, except that the conditional probability tables specify the probability of Y given interventions on the parent variables. Both models specify probabilities for Y, not a specific value of Y. In a structural causal model, there are no conditional probability tables. The arrows simply mean Y is a function of its parents, as well as the exogenous variable UY
70%
Flag icon
While this may seem like a rather arcane point, it does give mathematical proof that counterfactuals (rung three) lie above interventions (rung two) on the Ladder of Causation.
71%
Flag icon
Another pattern was people’s tendency to blame their own actions (e.g., striking a match) rather than events not under their control. Our ability to estimate PN and PS from our model of the world suggests a systematic way of accounting for these considerations and eventually teaching robots to produce meaningful explanations of peculiar events.
71%
Flag icon
whereas nobody is generally expected to pump all the oxygen out of the house in anticipation of a match-striking ceremony.
73%
Flag icon
In ordinary language, the question “Why?” has at least two versions. The first is straightforward: you see an effect, and you want to know the cause. Your grandfather is lying in the hospital, and you ask, “Why? How could he have had a heart attack when he seemed so healthy?” But there is a second version of the “Why?” question, which we ask when we want to better understand the connection between a known cause and a known effect.
Alexander Telfar
nodes versus edges
Linus liked this
73%
Flag icon
Mediation is also an important concept in the law. If we ask whether a company discriminated against women when it paid them lower salaries, we are asking a mediation question. The answer depends on whether the observed salary disparity is produced directly in response to the applicant’s sex or indirectly, through a mediator such as qualification, over which the employer has no control.
84%
Flag icon
This example is a textbook case of interaction. In the end, VanderWeele’s analysis proves three important things about the smoking gene. First, it does not significantly increase cigarette consumption. Second, it does not cause lung cancer through a smoking-independent path. Third, for those people who do smoke, it significantly increases the risk of lung cancer. The interaction between the gene and the subject’s behavior is everything.
Alexander Telfar
how is interaction captured by the causal graph!?!
86%
Flag icon
Within the last five years, my former student (now colleague) Elias Bareinboim and I have succeeded in giving a complete criterion for deciding when results are transportable and when they are not. As usual, the proviso for using this criterion is that you represent the salient features of the data-generating process with a causal diagram, marked with locations of potential disparities. “Transporting” a result does not necessarily mean taking it at face value and applying it to the new environment. The researcher may have to recalibrate it to allow for disparities between the two environments.
88%
Flag icon
The one game it lost to Sedol is the only one it will ever lose to a human.
88%
Flag icon
Even AlphaGo’s programmers cannot tell you why the program plays so well.
Alexander Telfar
ok, imagine we have a causal graph of our Go strategy. it is likely to be soo complex that it is uniterpretable anyway... not sure I agree with this
88%
Flag icon
“I have done X = x, and the outcome was Y = y. But if I had acted differently, say X = xʹ, then the outcome would have been better, perhaps Y = yʹ.”
Alexander Telfar
isnt that exactaly what a label and gradients provide!?