Harald G.’s Kindle Notes & Highlights for The Book of Why: The New Science of Cause and Effect

Rate it:

More on this book

Community

Phillip Hunter

1 note & 59 highlights

Brad Balderson

42 notes & 42 highlights

Michael Hayes

9 notes & 502 highlights

1 note & 102 highlights

Joanne McKinnon

4 notes & 5 highlights

Brian Cajes

1 note & 46 highlights

Alexander Telfar

16 notes & 47 highlights

Mark Gerstein

Benjamin Caldwell

Matt

Christopher

Devika

Roozbeh Daneshvar

Vadim Dmitriev

Nick Rong

Bronwyn

Juan Martin

Aurghyadip

Dale Alleshouse

Ian Pitchford

Mario Schlosser

Magnus

Alok Kejriwal

George Leontiev

Tom Semple

Bon Osonwanne

Benjamin

Nancy

Josh

Rahul Krishna

Mike

Eric Yang

Kindle Notes & Highlights

by Harald G.

See all Harald’s Notes & Highlights

The Book of Why: The New Science of Cause and Effect

by Judea Pearl

Read between June 26 - June 30, 2018

It tells us that correlation is not causation, but it does not tell us what causation is.

The calculus of causation consists of two languages: causal diagrams, to express what we know, and a symbolic language, resembling algebra, to express what we want to know.

12%

Friday Evening Discourse at the Royal Institution of Great Britain in London. Many discoveries of the nineteenth century were first announced to the public at this venue: Michael Faraday and the principles of photography in 1839; J. J. Thomson and the electron in 1897; James Dewar and the liquefaction of hydrogen in 1904.

12%

The central limit theorem is truly a miracle of nineteenth-century mathematics. Think about it: even though the path of any individual ball is unpredictable, the path of 1,000 balls is extremely predictable

19%

But Fisher was right about one point: once you remove causation from statistics, reduction of data is the only thing left.

19%

The fates of path analysis in economics and sociology followed different trajectories, each leading to a betrayal of Wright’s ideas. Sociologists renamed path analysis as structural equation modeling (SEM), embraced diagrams, and used them extensively until 1970, when a computer package called LISREL automated the calculation of path coefficients

19%

In economics, the algebraic part of path analysis became known as simultaneous equation models (no acronym). Economists essentially never used path diagrams and continue not to use them to this day, relying instead on numerical equations and matrix algebra.

20%

The prototype of Bayesian analysis goes like this: Prior Belief + New Evidence Revised Belief.

21%

a causal diagram is a Bayesian network in which every arrow signifies a direct causal relation, or at least the possibility of one, in the direction of that arrow.

21%

For Bayes, this assertion provoked a natural, one might say Holmesian question: How much evidence would it take to convince us that something we consider improbable has actually happened? When does a hypothesis cross the line from impossibility to improbability and even to probability or virtual certainty?

22%

it was not until 1931 that Harold Jeffreys (known more as a geophysicist than a probability theorist) introduced the now standard vertical bar in P(S | T).

23%

In this example the forward probability is the probability of a positive test, given that you have the disease: P(test | disease). This is what a doctor would call the “sensitivity” of the test, or its ability to correctly detect an illness.

23%

According to the Breast Cancer Surveillance Consortium (BCSC), the sensitivity of mammograms for forty-year-old women is 73 percent.

29%

Confounding bias occurs when a variable influences both who is selected for the treatment and the outcome of the experiment. Sometimes the confounders are known; other times they are merely suspected and act as a “lurking third variable.”

30%

Sometimes you end up controlling for the thing you’re trying to measure.”

32%

randomization is a way of simulating Model 2. It disables all the old confounders without introducing any new confounders. That is the source of its power; there is nothing mysterious or mystical about it.

32%

The reason for the difficulty is that confounding is not a statistical notion. It stands for the discrepancy between what we want to assess (the causal effect) and what we actually do assess using statistical methods. If you can’t articulate mathematically what you want to assess, you can’t expect to define what constitutes a discrepancy.

32%

Let’s take a look at some of the surrogate definitions of confounding. These fall into two main categories, declarative and procedural. A typical (and wrong) declarative definition would be “A confounder is any variable that is correlated with both X and Y.” On the other hand, a procedural definition would attempt to characterize a confounder in terms of a statistical test. This appeals to statisticians, who love any test that can be performed on the data directly without appealing to a model.

38%

The committee listed five such criteria: consistency (many studies, in different populations, show similar results); strength of association (including the dose-response effect: more smoking is associated with a higher risk); specificity of the association (a particular agent should have a particular effect and not a long litany of effects); temporal relationship (the effect should follow the cause); and coherence (biological plausibility and consistency with other types of evidence such as laboratory experiments and time series).

38%

Consistency by itself proves nothing; if thirty studies each ignore the same confounder, all can easily be biased. Strength of association is vulnerable for the same reason; as pointed out earlier, children’s shoe sizes are strongly associated with but not causally related to their reading aptitude. Specificity has always been a particularly controversial criterion. It makes sense in the context of infectious disease, where one agent typically produces one illness, but less so in the context of environmental exposure. Smoking leads to an increased risk of a variety of other diseases, such as ...more

46%

For many researchers, the most (perhaps only) familiar method of predicting the effect of an intervention is to “control” for confounders using the adjustment formula. This is the method to use if you are confident that you have data on a sufficient set of variables (called deconfounders) to block all the back-door paths between the intervention and the outcome. To do this, we measure the average causal effect of an intervention by first estimating its effect at each “level,” or stratum, of the deconfounder. We then compute a weighted average of those strata, where each stratum is weighted ...more

64%

All the above questions require a sensitive ability to tease apart total effects, direct effects (which do not pass through a mediator), and indirect effects (which do).

67%

What do we mean by bias?” Why does the bias sign change depending on the way we measure it? In fact he had the right idea when he distinguished between bias and discrimination. Bias is a slippery statistical notion, which may disappear if you slice the data a different way. Discrimination, as a causal concept, reflects reality and must remain stable.

67%

I cannot stress enough how often this blunder has been repeated over the years—conditioning on the mediator instead of holding the mediator constant. For that reason I call it the Mediation Fallacy. Admittedly, the blunder is harmless if there is no confounding of the mediator and the outcome. However, if there is confounding, it can completely reverse the analysis, as Kruskal’s numerical example showed. It can lead the investigator to conclude there is no discrimination when in fact there is.

75%

The questions I have just asked are all causal, and causal questions can never be answered from data alone. They require us to formulate a model of the process that generates the data, or at least some aspects of that process. Anytime you see a paper or a study that analyzes the data in a model-free way, you can be certain that the output of the study will merely summarize, and perhaps transform, but not interpret the data.