More on this book
Community
Kindle Notes & Highlights
by
Judea Pearl
Read between
September 21, 2020 - March 19, 2021
P(L | D) may be totally different from P(L | do(D)). This difference between seeing and doing is fundamental and explains why we do not regard the falling barometer to be a cause of the coming storm. Seeing the barometer fall increases the probability of the storm, while forcing it to fall does not affect this probability.
The ability to reflect on one’s past actions and envision alternative scenarios is the basis of free will and social responsibility.
the connection between imagining and causal relations is almost self-evident. It is useless to ask for the causes of things unless you can imagine their consequences.
We say that one event is associated with another if observing one changes the likelihood of observing the other.
The goal of strong AI is to produce machines with humanlike intelligence, able to converse with and guide humans. Deep learning has instead given us machines with truly impressive abilities but no intelligence. The difference is profound and lies in the absence of a model of reality.
Intervention ranks higher than association because it involves not just seeing but changing what is.
Sons of tall men tend to be taller than average—but not as tall as their fathers. Sons of short men tend to be shorter than average—but not as short as their fathers.
If students take two different standardized tests on the same material, the ones who scored high on the first test will usually score higher than average on the second test but not as high as they did the first time. This phenomenon of regression to the mean is ubiquitous in all facets of life, education, and business.
“Whig history” was the epithet used to mock the hindsighted style of history writing, which focused on successful theories and experiments and gave little credit to failed theories and dead ends. The modern style of history writing became more democratic, treating chemists and alchemists with equal respect and insisting on understanding all theories in the social context of their own time.
Unlike correlation and most of the other tools of mainstream statistics, causal analysis requires the user to make a subjective commitment. She must draw a causal diagram that reflects her qualitative belief—or, better yet, the consensus belief of researchers in her field of expertise—about the topology of the causal processes at work. She must abandon the centuries-old dogma of objectivity for objectivity’s sake. Where causation is concerned, a grain of wise subjectivity tells us more about the real world than any amount of objectivity.
In addition, in many cases it can be proven that the influence of prior beliefs vanishes as the size of the data increases, leaving a single objective conclusion in the end.
A Bayesian network is literally nothing more than a compact representation of a huge probability table.
Confounding bias occurs when a variable influences both who is selected for the treatment and the outcome of the experiment.
Nature is like a genie that answers exactly the question we pose, not necessarily the one we intend to ask.
Fisher realized that an uncertain answer to the right question is much better than a highly certain answer to the wrong question.
Confounding, then, should simply be defined as anything that leads to a discrepancy between the two: P(Y | X) ≠ P(Y | do(X)).
I define confounding as anything that makes P(Y | do(X)) differ from P(Y | X).
a back-door path is any path from X to Y that starts with an arrow pointing into X. X and Y will be deconfounded if we block every back-door path (because such paths allow spurious correlation between X and Y). If we do this by controlling for some set of variables Z, we also need to make sure that no member of Z is a descendant of X on a causal path; otherwise we might partly or completely close off that path.
I consider the complete solution of the confounding problem one of the main highlights of the Causal Revolution because it ended an era of confusion that has probably resulted in many wrong decisions in the past.
“dose-response effect”: if substance A causes a biological effect B, then usually (though not always) a larger dose of A causes a stronger response B.
the cultural shocks that emanate from new scientific findings are eventually settled by cultural realignments that accommodate those findings—not by concealment. A prerequisite for this realignment is that we sort out the science from the culture before opinions become inflamed.
Paradoxes arise when we misapply the rules we have learned in one realm to the other.
The lesson is quite simple: the way that we obtain information is no less important than the information itself.
This is a general theme of Bayesian analysis: any hypothesis that has survived some test that threatens its validity becomes more likely. The greater the threat, the more likely it becomes after surviving.
In my opinion, a true resolution of a paradox should explain why we see it as a paradox in the first place.
conditioning on a collider creates a spurious association
We live our lives as if the common cause principle were true. Whenever we see patterns, we look for a causal explanation. In fact, we hunger for an explanation, in terms of stable mechanisms that lie outside the data. The most satisfying kind of explanation is direct causation: X causes Y. When that fails, finding a common cause of X and Y will usually satisfy us.
Simpson’s paradox alerts us to cases where at least one of the statistical trends (either in the aggregated data, the partitioned data, or both) cannot represent the causal effects.
Path coefficients are fundamentally different from regression coefficients, although they can often be computed from the latter.
Rule 1 says when we observe a variable W that is irrelevant to Y (possibly conditional on other variables Z), then the probability distribution of Y will not change.
We know that if a set Z of variables blocks all back-door paths from X to Y, then conditional on Z, do(X) is equivalent to see(X). We can, therefore, write P(Y | do(X), Z) = P(Y | X, Z) if Z satisfies the back-door criterion. We adopt this as Rule 2 of our axiomatic system.
Rule 3 is quite simple: it essentially says that we can remove do(X) from P(Y | do(X)) in any case where there are no causal paths from X to Y. That is, P(Y | do(X)) = P(Y) if there is no path from X to Y with only forward-directed arrows.
From the point of view of causal analysis, this teaches us a good lesson: in any study of interventions, we need to ask whether the variable we’re actually manipulating (lifetime LDL levels) is the same as the variable we think we are manipulating (current LDL levels). This is part of the “skillful interrogation of nature.”
Responsibility and blame, regret and credit: these concepts are the currency of a causal mind. To make any sense of them, we must be able to compare what did happen with what would have happened under some alternative hypothesis.
our ability to conceive of alternative, nonexistent worlds separated us from our protohuman ancestors and indeed from any other creature on the planet. Every other creature can see what is. Our gift, which may sometimes be a curse, is that we can see what might have been.
mistaking a mediator for a confounder is one of the deadliest sins in causal inference and may lead to the most outrageous errors. The latter invites adjustment; the former forbids it.
In fact he had the right idea when he distinguished between bias and discrimination. Bias is a slippery statistical notion, which may disappear if you slice the data a different way. Discrimination, as a causal concept, reflects reality and must remain stable.
Anytime you see a paper or a study that analyzes the data in a model-free way, you can be certain that the output of the study will merely summarize, and perhaps transform, but not interpret the data.
Data interpretation means hypothesizing on how things operate in the real world.