More on this book
Community
Kindle Notes & Highlights
by
Judea Pearl
Read between
May 18 - May 22, 2018
Only after gamblers invented intricate games of chance, sometimes carefully designed to trick us into making bad choices, did mathematicians like Blaise Pascal (1654), Pierre de Fermat (1654), and Christiaan Huygens (1657) find it necessary to develop what we today call probability theory. Likewise, only when insurance organizations demanded accurate estimates of life annuity did mathematicians like Edmond Halley (1693) and Abraham de Moivre (1725) begin looking at mortality tables to calculate life expectancies.
Mohamed liked this
First, in the world of AI, you do not really understand a topic until you can teach it to a mechanical robot.
Millions of lives were lost or shortened because scientists did not have an adequate language or methodology for answering causal questions.
Of special interest are questions concerning necessary and sufficient causes of observed events. For example, how likely is it that the defendant’s action was a necessary cause of the claimant’s injury? How likely is it that man-made climate change is a sufficient cause of a heat wave?
Why not just go into our vast database of previous purchases and see what happened previously when toothpaste cost twice as much? The reason is that on the previous occasions, the price may have been higher for different reasons. For example, the product may have been in short supply, and every other store also had to raise its price. But now you are considering a deliberate intervention that will set a new price regardless of market conditions.
Given enough time, chances are that someone else will have tried doubling the price. You can just observe. (this assumes perfect information, or that hidden variables to not effect what you are about)
What interventions give us is data efficiency. Where observation required LOTS of data, intervention needs (possibly) a single action.
Want to formulate this! Under various types of model/noise what is the data efficiency of interventions versus observation!?
The rewards of having a causal model that can answer counterfactual questions are immense.
Why was the forward probability (of x given L) so much easier to assess mentally than the probability of L given x? In this example, the asymmetry comes from the fact that L acts as the cause and x is the effect. If we observe a cause—for example, Bobby throws a ball toward a window—most of us can predict the effect (the ball will probably break the window).
like surjective functions (many-to-one).
an output has many possible mappings to it, but each input maps to the same output.
Daniel also understood that it was important to compare groups. In this respect he was already more sophisticated than many people today, who choose a fad diet (for example) just because a friend went on that diet and lost weight. If you choose a diet based only on one friend’s experience, you are essentially saying that you believe you are similar to your friend in all relevant details: age, heredity, home environment, previous diet, and so forth. That is a lot to assume.
Sometimes you end up controlling for the thing you’re trying to measure.”
It is known as a mediator: it is the variable that explains the causal effect of X on Y. It is a disaster to control for Z if you are trying to find the causal effect of X on Y.
Statisticians very often control for proxies when the actual causal variable can’t be measured; for instance, party affiliation might be used as a proxy for political beliefs. Because Z isn’t a perfect measure of M, some of the influence of X on Y might “leak through” if you control for Z.
The cholera bacillus is the only cause of cholera; or as we would say today, it is both necessary and sufficient.
Doctors call it a “dose-response effect”: if substance A causes a biological effect B, then usually (though not always) a larger dose of A causes a stronger response B
health.” In a speech given in March 1954, George Weissman, vice president of Philip Morris and Company, said, “If we had any thought or knowledge that in any way we were selling a product harmful to consumers, we would stop business tomorrow.” Sixty years later, we are still waiting for Philip Morris to keep that promise.
Our brains are not prepared to accept causeless correlations, and we need special training—through examples like the Monty Hall paradox
This last comment brings up one more curious point about Lord’s paradox as originally phrased. Although the stated intention of the school dietitian is to “determine the effects of the diet,” nowhere in his original paper does Lord mention a control diet. Therefore we can’t even say anything about the diet’s effects.
To do this, we measure the average causal effect of an intervention by first estimating its effect at each “level,” or stratum, of the deconfounder. We then compute a weighted average of those strata, where each stratum is weighted according to its prevalence in the population. If, for example, the deconfounder is gender, we first estimate the causal effect for males and females. Then we average the two, if the population is (as usual) half male and half female. If the proportions are different—say, two-thirds male and one-third female—then to estimate the average causal effect we would take a
...more
fact, one of the major accomplishments of causal diagrams is to make the assumptions transparent so that they can be discussed and debated by experts
Now let us return to our central question of when a model can replace an experiment, or when a “do” quantity can be reduced to a “see” quantity.
We start with our target sentence, P(Y | do(X)). Our task will be complete if we can succeed in eliminating the do-operator from it, leaving only classical probability expressions, like P(Y | X) or P(Y | X, Z, W). We cannot, of course, manipulate our target expression at will; the operations must conform to what do(X) means as a physical intervention.
We know that if a set Z of variables blocks all back-door paths from X to Y, then conditional on Z, do(X) is equivalent to see(X). We can, therefore, write P(Y | do(X), Z) = P(Y | X, Z)
Unlike most diagrams in this book, this one has “two-way” arrows, but I would ask the reader not to lose too much sleep over it. With some mathematical trickery we could equally well replace the Demand → Price → Supply chain with a single arrow Demand → Supply, and the figure would then look like Figure 7.9 (though it would be less acceptable to economists).
For decades or even centuries, lawyers have used a relatively straightforward test of a defendant’s culpability called “but-for causation”: the injury would not have occurred but for the defendant’s action.
It will be helpful to distinguish three different kinds of causation: necessary causation, sufficient causation, and necessary-and-sufficient causation.
Lewis argued that we evaluate counterfactuals by comparing our world, where he did not take aspirin, to the most similar world in which he did take an aspirin.
“Most similar” is key. There may be some “possible worlds” in which his headache did not go away—for example, a world in which he took the aspirin and then bumped his head on the bathroom door. But that world contains an extra, adventitious circumstance.
so this ability really depends on the ability to disentangle various phemonema!?!?
this seems important!?!
if headache/bang head were not separate concepts then we would not be able to do this!?
But we need to allow for individual variations, so we extend this function to read S = fS(EX, ED, US), where US stands for “unobserved variables that affect salary.”
uhh. so we just alter the causal graph... adding noise to replace hidden variables makes all sorts of assumptions!?!
- those variables have "little effect" on EX, ED and S.
- they are not biased in any way
- ?
Of course, there is no such thing as a free lunch: we got these strong results because we made strong assumptions.
In a probabilistic Bayesian network, the arrows into Y mean that the probability of Y is governed by the conditional probability tables for Y, given observations of its parent variables. The same is true for causal Bayesian networks, except that the conditional probability tables specify the probability of Y given interventions on the parent variables. Both models specify probabilities for Y, not a specific value of Y. In a structural causal model, there are no conditional probability tables. The arrows simply mean Y is a function of its parents, as well as the exogenous variable UY
While this may seem like a rather arcane point, it does give mathematical proof that counterfactuals (rung three) lie above interventions (rung two) on the Ladder of Causation.
Another pattern was people’s tendency to blame their own actions (e.g., striking a match) rather than events not under their control. Our ability to estimate PN and PS from our model of the world suggests a systematic way of accounting for these considerations and eventually teaching robots to produce meaningful explanations of peculiar events.
whereas nobody is generally expected to pump all the oxygen out of the house in anticipation of a match-striking ceremony.
In ordinary language, the question “Why?” has at least two versions. The first is straightforward: you see an effect, and you want to know the cause. Your grandfather is lying in the hospital, and you ask, “Why? How could he have had a heart attack when he seemed so healthy?” But there is a second version of the “Why?” question, which we ask when we want to better understand the connection between a known cause and a known effect.
Linus liked this
Mediation is also an important concept in the law. If we ask whether a company discriminated against women when it paid them lower salaries, we are asking a mediation question. The answer depends on whether the observed salary disparity is produced directly in response to the applicant’s sex or indirectly, through a mediator such as qualification, over which the employer has no control.
This example is a textbook case of interaction. In the end, VanderWeele’s analysis proves three important things about the smoking gene. First, it does not significantly increase cigarette consumption. Second, it does not cause lung cancer through a smoking-independent path. Third, for those people who do smoke, it significantly increases the risk of lung cancer. The interaction between the gene and the subject’s behavior is everything.
Within the last five years, my former student (now colleague) Elias Bareinboim and I have succeeded in giving a complete criterion for deciding when results are transportable and when they are not. As usual, the proviso for using this criterion is that you represent the salient features of the data-generating process with a causal diagram, marked with locations of potential disparities. “Transporting” a result does not necessarily mean taking it at face value and applying it to the new environment. The researcher may have to recalibrate it to allow for disparities between the two environments.
The one game it lost to Sedol is the only one it will ever lose to a human.