Alexander Telfar’s Kindle Notes & Highlights for The Book of Why: The New Science of Cause and Effect (Penguin Science)

Rate it:

Open Preview

More on this book

Community

Phillip Hunter

1 note & 59 highlights

Brad Balderson

42 notes & 42 highlights

Michael Hayes

9 notes & 502 highlights

1 note & 102 highlights

Joanne McKinnon

4 notes & 5 highlights

Brian Cajes

1 note & 46 highlights

Mark Gerstein

Benjamin Caldwell

Matt

Christopher

Devika

Roozbeh Daneshvar

Harald G.

Vadim Dmitriev

Nick Rong

Bronwyn

Juan Martin

Aurghyadip

Dale Alleshouse

Ian Pitchford

Mario Schlosser

Magnus

Alok Kejriwal

George Leontiev

Tom Semple

Bon Osonwanne

Benjamin

Nancy

Josh

Rahul Krishna

Mike

Eric Yang

Kindle Notes & Highlights

by Alexander Telfar

See all Alexander’s Notes & Highlights

The Book of Why: The New Science of Cause and Effect (Penguin Science)

by Judea Pearl

Read between May 18 - May 22, 2018

Only after gamblers invented intricate games of chance, sometimes carefully designed to trick us into making bad choices, did mathematicians like Blaise Pascal (1654), Pierre de Fermat (1654), and Christiaan Huygens (1657) find it necessary to develop what we today call probability theory. Likewise, only when insurance organizations demanded accurate estimates of life annuity did mathematicians like Edmond Halley (1693) and Abraham de Moivre (1725) begin looking at mortality tables to calculate life expectancies.

Mohamed liked this

First, in the world of AI, you do not really understand a topic until you can teach it to a mechanical robot.

Millions of lives were lost or shortened because scientists did not have an adequate language or methodology for answering causal questions.

“We think of a cause as something that makes a difference, and the difference it makes must be a difference from what would have happened without it.”

Satyajeet liked this

Of special interest are questions concerning necessary and sufficient causes of observed events. For example, how likely is it that the defendant’s action was a necessary cause of the claimant’s injury? How likely is it that man-made climate change is a sufficient cause of a heat wave?

Why not just go into our vast database of previous purchases and see what happened previously when toothpaste cost twice as much? The reason is that on the previous occasions, the price may have been higher for different reasons. For example, the product may have been in short supply, and every other store also had to raise its price. But now you are considering a deliberate intervention that will set a new price regardless of market conditions.

Given enough time, chances are that someone else will have tried doubling the price. You can just observe. (this assumes perfect information, or that hidden variables to not effect what you are about) What interventions give us is data efficiency. Where observation required LOTS of data, intervention needs (possibly) a single action. Want to formulate this! Under various types of model/noise what is the data efficiency of interventions versus observation!?

The rewards of having a causal model that can answer counterfactual questions are immense.

24%

Why was the forward probability (of x given L) so much easier to assess mentally than the probability of L given x? In this example, the asymmetry comes from the fact that L acts as the cause and x is the effect. If we observe a cause—for example, Bobby throws a ball toward a window—most of us can predict the effect (the ball will probably break the window).

like surjective functions (many-to-one). an output has many possible mappings to it, but each input maps to the same output.

32%

For example, we cannot distinguish the fork A ← B → C from the chain A → B → C by data alone, because the two diagrams imply the same independence conditions.

but i thought that if we controlled for B then A would be linked to C in the fork, while A, C would be independent in the chain?

33%

Daniel also understood that it was important to compare groups. In this respect he was already more sophisticated than many people today, who choose a fad diet (for example) just because a friend went on that diet and lost weight. If you choose a diet based only on one friend’s experience, you are essentially saying that you believe you are similar to your friend in all relevant details: age, heredity, home environment, previous diet, and so forth. That is a lot to assume.

33%

Sometimes you end up controlling for the thing you’re trying to measure.”

37%

It is known as a mediator: it is the variable that explains the causal effect of X on Y. It is a disaster to control for Z if you are trying to find the causal effect of X on Y.

37%

Statisticians very often control for proxies when the actual causal variable can’t be measured; for instance, party affiliation might be used as a proxy for political beliefs. Because Z isn’t a perfect measure of M, some of the influence of X on Y might “leak through” if you control for Z.

41%

The cholera bacillus is the only cause of cholera; or as we would say today, it is both necessary and sufficient.

42%

Doctors call it a “dose-response effect”: if substance A causes a biological effect B, then usually (though not always) a larger dose of A causes a stronger response B

43%

health.” In a speech given in March 1954, George Weissman, vice president of Philip Morris and Company, said, “If we had any thought or knowledge that in any way we were selling a product harmful to consumers, we would stop business tomorrow.” Sixty years later, we are still waiting for Philip Morris to keep that promise.

47%

Our brains are not prepared to accept causeless correlations, and we need special training—through examples like the Monty Hall paradox

52%

This last comment brings up one more curious point about Lord’s paradox as originally phrased. Although the stated intention of the school dietitian is to “determine the effects of the diet,” nowhere in his original paper does Lord mention a control diet. Therefore we can’t even say anything about the diet’s effects.

53%

To do this, we measure the average causal effect of an intervention by first estimating its effect at each “level,” or stratum, of the deconfounder. We then compute a weighted average of those strata, where each stratum is weighted according to its prevalence in the population. If, for example, the deconfounder is gender, we first estimate the causal effect for males and females. Then we average the two, if the population is (as usual) half male and half female. If the proportions are different—say, two-thirds male and one-third female—then to estimate the average causal effect we would take a ...more

53%

You can easily extend the procedure to deal with multiple variables as well.

no, the curse of dimensionality still exists. and some parameters will be harder to fit!? increasing the number of dims requires more data to make fitting possible!?

55%

fact, one of the major accomplishments of causal diagrams is to make the assumptions transparent so that they can be discussed and debated by experts

56%

Now let us return to our central question of when a model can replace an experiment, or when a “do” quantity can be reduced to a “see” quantity.

56%

We start with our target sentence, P(Y | do(X)). Our task will be complete if we can succeed in eliminating the do-operator from it, leaving only classical probability expressions, like P(Y | X) or P(Y | X, Z, W). We cannot, of course, manipulate our target expression at will; the operations must conform to what do(X) means as a physical intervention.

56%

We know that if a set Z of variables blocks all back-door paths from X to Y, then conditional on Z, do(X) is equivalent to see(X). We can, therefore, write P(Y | do(X), Z) = P(Y | X, Z)

61%

Unlike most diagrams in this book, this one has “two-way” arrows, but I would ask the reader not to lose too much sleep over it. With some mathematical trickery we could equally well replace the Demand → Price → Supply chain with a single arrow Demand → Supply, and the figure would then look like Figure 7.9 (though it would be less acceptable to economists).

uhh. no. !?!?

63%

For decades or even centuries, lawyers have used a relatively straightforward test of a defendant’s culpability called “but-for causation”: the injury would not have occurred but for the defendant’s action.

63%

It will be helpful to distinguish three different kinds of causation: necessary causation, sufficient causation, and necessary-and-sufficient causation.

64%

Lewis argued that we evaluate counterfactuals by comparing our world, where he did not take aspirin, to the most similar world in which he did take an aspirin.

64%

“Most similar” is key. There may be some “possible worlds” in which his headache did not go away—for example, a world in which he took the aspirin and then bumped his head on the bathroom door. But that world contains an extra, adventitious circumstance.

so this ability really depends on the ability to disentangle various phemonema!?!? this seems important!?! if headache/bang head were not separate concepts then we would not be able to do this!?

65%

We can describe this as making the minimal alteration to a causal diagram needed to ensure that X equals x. In this respect, structural counterfactuals are compatible with Lewis’s idea of the most similar possible world.

minimal alteration defined how!?

65%

Second, they were modeled on Bayesian networks, which in turn were modeled on David Rumelhart’s description of message passing in the brain.

huh, did not know that

65%

credit and blame, responsibility and regret. These are all counterfactual notions

!!! - so credit assignment is given by the counterfactual of asking what if I had predicted the target!?

66%

They are data driven, not model driven.

huh, that is the first time I have seen "data driven" being used with a negative connotation

67%

But we need to allow for individual variations, so we extend this function to read S = fS(EX, ED, US), where US stands for “unobserved variables that affect salary.”

uhh. so we just alter the causal graph... adding noise to replace hidden variables makes all sorts of assumptions!?! - those variables have "little effect" on EX, ED and S. - they are not biased in any way - ?

68%

Of course, there is no such thing as a free lunch: we got these strong results because we made strong assumptions.

68%

“consistency.” It says that a person who took aspirin and recovered would also recover if given aspirin by experimental design.

hmm...

69%

In a probabilistic Bayesian network, the arrows into Y mean that the probability of Y is governed by the conditional probability tables for Y, given observations of its parent variables. The same is true for causal Bayesian networks, except that the conditional probability tables specify the probability of Y given interventions on the parent variables. Both models specify probabilities for Y, not a specific value of Y. In a structural causal model, there are no conditional probability tables. The arrows simply mean Y is a function of its parents, as well as the exogenous variable UY

70%

While this may seem like a rather arcane point, it does give mathematical proof that counterfactuals (rung three) lie above interventions (rung two) on the Ladder of Causation.

71%

Another pattern was people’s tendency to blame their own actions (e.g., striking a match) rather than events not under their control. Our ability to estimate PN and PS from our model of the world suggests a systematic way of accounting for these considerations and eventually teaching robots to produce meaningful explanations of peculiar events.

71%

whereas nobody is generally expected to pump all the oxygen out of the house in anticipation of a match-striking ceremony.

73%

In ordinary language, the question “Why?” has at least two versions. The first is straightforward: you see an effect, and you want to know the cause. Your grandfather is lying in the hospital, and you ask, “Why? How could he have had a heart attack when he seemed so healthy?” But there is a second version of the “Why?” question, which we ask when we want to better understand the connection between a known cause and a known effect.

nodes versus edges

Linus liked this

73%

Mediation is also an important concept in the law. If we ask whether a company discriminated against women when it paid them lower salaries, we are asking a mediation question. The answer depends on whether the observed salary disparity is produced directly in response to the applicant’s sex or indirectly, through a mediator such as qualification, over which the employer has no control.

84%

This example is a textbook case of interaction. In the end, VanderWeele’s analysis proves three important things about the smoking gene. First, it does not significantly increase cigarette consumption. Second, it does not cause lung cancer through a smoking-independent path. Third, for those people who do smoke, it significantly increases the risk of lung cancer. The interaction between the gene and the subject’s behavior is everything.

how is interaction captured by the causal graph!?!

86%

Within the last five years, my former student (now colleague) Elias Bareinboim and I have succeeded in giving a complete criterion for deciding when results are transportable and when they are not. As usual, the proviso for using this criterion is that you represent the salient features of the data-generating process with a causal diagram, marked with locations of potential disparities. “Transporting” a result does not necessarily mean taking it at face value and applying it to the new environment. The researcher may have to recalibrate it to allow for disparities between the two environments.

88%

The one game it lost to Sedol is the only one it will ever lose to a human.

88%

Even AlphaGo’s programmers cannot tell you why the program plays so well.

ok, imagine we have a causal graph of our Go strategy. it is likely to be soo complex that it is uniterpretable anyway... not sure I agree with this

88%

“I have done X = x, and the outcome was Y = y. But if I had acted differently, say X = xʹ, then the outcome would have been better, perhaps Y = yʹ.”

isnt that exactaly what a label and gradients provide!?

See a Problem?

Preview — The Book of Why by Judea Pearl