nostalgebraist's Reviews > The Book of Why: The New Science of Cause and Effect

The Book of Why by Judea Pearl
Rate this book
Clear rating

by
8178894
's review

did not like it

I had high hopes for this book. I've been interested in causal inference for a number of years, and I think it's an field that could drastically improve the practice of statistical science if its techniques became widely adopted. A popular book on the field, written by one of its founders, seemed like an exciting development. Finally, people would be talking about this stuff! It would no longer be just another arcane heterodoxy, invoked by academic gadflies in seminar rooms and on blogs, generating long back-and-forth arguments without changing orthodox practice. The general public would be allowed into the debate, and that would mean a new kind of demand -- for undergraduate courses in causality, for answers to the hard causal questions that currently get smoothed over in academic press releases and, often, in the underlying research itself. Maybe a popular book would change academia in a way that academic work had been unable to.

And maybe all that will still happen. If nothing else, this book has sold well, and it is about an important topic whose very existence is still not widely known. There are awareness-raising functions that such a book can perform no matter what is between its covers, and insofar as it performs those functions, this book should get credit simply for existing. Unfortunately, that is the only credit it deserves. The book itself, the thing between the covers, is a disaster.

There are two distinct ways in which this book is a failure. It fails as a work of popular science writing: it is badly structured, rambling, full of overly detailed and technical discussions of side issues while barely even attempting to explain some of its central concepts. And, separately, it fails as a work of scientific and philosophical thought: its account of causal modeling, if taken at face value, is incoherent. I'll address these two failures in reverse order.



It is a point of conventional wisdom in statistical science that cause and effect can only be investigated through experiment. If you want to know whether X affects Y, and how much, you must actively intervene to give X different values across some group, then observe Y. For example, if Y is a disease and X is a drug that might prevent it, you could assemble a group of people, choose some of them at random and tell them to take the drug (the "treatment group"), tell the rest not to take it (the "control group"), and then wait to see who gets the disease and who doesn't.

According to the conventional wisdom, you can't learn anything about the preventive effect of the drug by simply observing people who take it and people who don't. You have to tell people what to do, if you want to know the drug's effects. Why? Because an individual's choice to take or not take a given drug is a consequence of many factors in that individual's life, and some of those factors could also influence the disease. For example, people with known risk factors for a disease will be more likely to take a drug claiming to protect against it, which means that people who take the drug are more likely to have those risk factors. This will raise the incidence of the disease among takers of the drug, making the drug look less effective than it really is. On the other hand, perhaps this drug is only known to very health-conscious people, and so those who take it are generally in better health to begin with, thus less likely to contract the disease; this would make the drug-taking population less prone to the disease even if the drug does nothing at all. And so on, for any number of hypothetical complicating factors (or "confounders," as they're known in the trade).

This is a fully general point, which holds everywhere, not just in this pharmaceutical example. For any proposed cause X and effect Y, there is the specter of confounders. Everyone agrees about one way of evading this problem: doing an experiment. The conventional wisdom says that this is the only way. It says that "observational data" (about what happens without an experimental intervening) is mute on matters of causation.

The field of causal inference begins, more or less, with the denial of this bit of conventional wisdom. Claims about causation, it says, have implications about what observational data will look like. The nature of these implications is complex and subtle, and it's intellectually a lot harder to work them out than to just do an experiment, provided the experiment is possible (and ethical) to conduct. The observational data at hand is not always sufficient to answer a given causal question; adjudicating between two causal claims can require knowledge of variables you happened not to measure, and in some cases is actually impossible without experiment. But not in every case. Cause and effect leave fingerprints -- albeit partial and obscured fingerprints -- in what we see.

And after all, how could it be otherwise? We can reason about what causes what in everyday life, in spite of the fact that we virtually never have the opportunity to do controlled experiments on our immediate surroundings (much less on our friends or colleagues). If you and I can do it, surely science ought to be able to.

Just to be clear, I'm now describing the field of causal inference in my own words, so as to contrast it with what is written in Pearl's book, which is quite different. Pearl is a giant in this field, so it might seem strange that my account would diverge from his -- where am I getting this stuff, if not from him? Well, I've mostly learned about causal inference from the work of Richard Scheines, Clark Glymour, and Peter Spirtes, especially their wonderful book Causation, Prediction and Search. Maybe Pearl is simply up to something different from S+C+S. If so, I much prefer the S+C+S version. Anyway, here's a bit more of my own account.

I said that "causal claims" have implications for observational data. To make this precise, we need some precise way of representing a "causal claim." This is done via "causal diagrams," which are pictures in which the variables (like X and Y) are connected by arrows. An arrow like X --> Y means "X causes Y," or, more precisely, "X has a causal influence on Y." In the drug example above -- where X was taking the drug and Y was contracting the disease -- we could represent "risk factors" by another variable (say, "Z"), with an arrow Z --> Y (risk factors make the disease more likely) and another arrow Z --> X (people with risk factors are more likely to take the drug).

From such a diagram, you can derive certain facts that must hold in observational data if the diagram's claim is true. For example, if you have arrows X --> Y and Y --> Z, but no arrow X --> Z, this means that X causes Z only through Y and not directly. So if you hold Y constant -- say by grouping the data by the different values of Y, and looking at each group in isolation -- X and Z will no longer be related. This is a testable claim about observational data! A given data set can have this property or not have it (to some degree of statistical confidence), and if the data set doesn't have the property, the diagram must have been wrong.

All of that, you will note, is qualitative/binary. An arrow is either present or absent in a diagram; an implied fact either holds true in the data or doesn't. No continuous shades of grey. But we can go further by attaching numbers (called "coefficients") to the arrows, representing how strong or weak each effect is. These numbers can themselves be estimated from data, and it is only after estimating them that we have a quantitative model telling us which effects are big and which are small.

Now, an arrow X --> Y with a tiny coefficients means that X does influence Y, but only a very small amount. What if the coefficient were actually zero? This means that X doesn't really influence Y after all. In other words, it's equivalent to the arrow not being there at all. This is a really important point, because it unifies the qualitative question ("does a diagram with these arrows fit the data?") and the quantitative question ("what coefficients goes with each arrow?") Any diagram you can draw is just a version of the most general diagram -- the one with every possible arrow -- except with some of the arrow coefficients set to zero. A diagram, then, is a set of claims about which arrows, out of all the possible ones, should have zeros next to them.

If that was confusingly technical, look at it this way. Return to our drug example, and consider the question, "when we account for confounders, does the drug have any effect at all?" This question is asking whether there should be an arrow X --> Y between the drug, X, and the disease, Y. But that's the same as asking whether, in a diagram with that arrow, the arrow's coefficient should be zero or not. Thus, we can start with a diagram that includes the arrow -- as if we're assuming the drug does have some effect -- and then proceed to estimate the coefficient. If it turns out to be zero, we can safely delete the arrow. Thus, when we draw a diagram, we are not assuming all the arrows it in represent real effects. All of the assumptions are in the arrows not drawn: these effects are assumed to be absent, and all other calculations are done against this background of assumptions.



Enough from me. How does The Book of Why tell it?

Well, uh . . . confusingly. One of the book's central claims, re-asserted again and again, is that data on their own are "dumb," and cannot be causally interpreted until one draws a diagram, representing a set of assumptions about the reality behind the data. In Pearl's view, the assumptions in a causal diagram live in some separate, a priori realm, completely distinct from "data." He never tells us where the diagrams are supposed to come from, if not from empirical observations about the world; the book contains scattered references to "background knowledge" or "common sense," but these never coalesce into a general statement about what sort of information is allowed to inform our choice of diagram. We are only told that whatever this information is, it must not be "data." (Whatever that means!)

So, in Pearl's version of causal inference, you must first choose a diagram before you see the observational data at all. You are (apparently?) not allowed to change this diagram once you see the data, since diagrams do not come from data. You can do only one thing from the data, and that is estimating the arrow coefficients. (I am gliding over some complications about linear vs. nonlinear models here; in the latter you might estimate a more complex object for each arrow.) Although he never says as much, this is the entire subject of Pearl's book: no more and no less than estimating the coefficients for a pre-specified, fixed diagram.

This makes the book mightily hard to follow if you come in expecting an account of how to answer "why" questions. For example, the book spends a lot of time on the topic of smoking and lung cancer, a subject of vigorous debate in the mid 20th century. The observational data were clear: smokers got lung cancer way more often. But, just as in the drug example above, this was not conclusive evidence that smoking caused lung cancer, since there is the possibility of confounding. What if (say) there is some gene that makes people want to smoke more, and also causes lung cancer? Then smokers would get lung cancer more often, but it would not be the fault of their smoking, and quitting would not save them.

Just as in the drug example, we can draw a diagram with arrows for both proposed mechanisms. We have X (smoking) and Y (lung cancer), with an arrow X --> Y. And then we have Z (the gene), with an arrow Z --> X (the gene makes you smoke) and Z --> Y (the gene causes cancer). Pearl draws this exact diagram quite a few times, to describe the smoking / cancer scenario as well as various others.

So how does Pearl propose to answer the question, "does smoking cause lung cancer?" Well, he runs us through the technical machinery involved in estimating the coefficient for the arrow X --> Y, even in the presence of possible confounding. (He does this in bits and pieces, commingled with random anecdotes and digressions and weird spurts of overly detailed technicalities, but it's all in there, ultimately.) Now, as I understand it, the question "does smoking cause lung cancer?" is equivalent to the question "is the X --> Y coefficient nonzero?" So estimating this coefficient is a fine thing to do. But, as I said earlier, this only makes sense with the understanding that the arrows present in the diagram are concessions that an effect might be present, not assertions that it is -- and that the real assumptions lie in the arrows, and variables, we exclude from our diagram.

Pearl never clarifies this. As he tells it, everything in a diagram is an a priori assumption, existing in a separate realm that cannot be touched by mere data. If this is true, then I don't see what there is to stop someone from just drawing a diagram with no arrow from X --> Y at all, and claiming that by their causal analysis, smoking does not cause lung cancer. Of course, this is absurd. They've assumed the conclusion they were trying to prove. But how would Pearl argue against them? He can't say "your diagram is empirically wrong, because it has implications that are not true of the data." All he can do is estimate the coefficients for the two arrows that remain, and say, "well, given your diagram, this is how much Z affects X and Y." He can't tell them they've omitted an arrow that should be there according to the data, because for him the arrows come from your mind, not the data.

To be clear, I don't think Pearl actually disagrees with me here. Given this actual question, he would correctly answer that the diagram without the arrow implies such-and-such conditional independence results, and point to the flagrant violation of those results in observed reality. But this is inconsistent with the framework which he states again and again in the book.

If you try to take the book literally, things get weird. Sometimes Pearl admits -- or claims proudly, as if it's a testament to the power of causal inference -- that data can confirm or deny the presence of an arrow. But then he goes on to draw all sorts of diagrams without any reference to data, assuring the reader that he can draw whatever arrows he pleases, and the "dumb" data can't stop him. So we are adrift in this strange world where one makes certain causal assumptions (encoded in a diagram) for the sake of assessing others, never sure whether any given arrow is an inviolable assumption or a testable hypothesis, or what makes the difference.



So, that's what I meant when I said the book was a failure "as a work of scientific and philosophical thought." What about my other assertion, that it's a failure "as a work of popular science writing"?

First of all, Pearl and his co-author make the disastrous choice to organize the book chronologically, and present it as a history of causal inference. The problem here is that most of the historical narrative involves the discipline of statistics fumbling around and failing to deal properly with causation.

I agree with Pearl in his negative assessments of these past efforts, but the chronological organization means we must suffer through >100pp of lamentations that past researchers did not have the benefits of Pearl's methodology before we reach the part of the story where Pearl's methodology is actually explained. We are told many times that so-and-so failed because they did not have access to something called the "do-calculus," long before we are informed what the do-calculus actually is. The reader would have been much better served by a book that explained Pearl's entire approach at the outset, in one or two self-contained chapters, and only then went on to show how badly things went before its invention.

Second, even when the explanations do come, they are badly botched. The book seems to be written on the principle that if you include enough examples, funny anecdotes, cute cartoon illustrations, etc., you have written a popular science book, even if you never quite state the actual science in a way the general reader can understand. The book is decent at explaining things that are easy to explain (although not always -- the early account of regression is shockingly bad), but when Pearl tries to describe the real meat of his intellectual contributions, he usually just throws a few undigested equations at the reader, surrounds them with some riffs about his academic colleagues, and calls it a day.

I'm a data scientist with a doctorate in applied math and some prior reading in causal inference, and I still found it hard to understand many of the weird, fragmentary, notation-heavy "explanations" in this book; I have no idea what the general reader will make of them. If you don't know how to read conditional probability expressions involving nested sums over dummy variables with unspecified limits, some key points in this book will look like gibberish to you. (And if you do, you really ought to be reading the technical literature, which can be considerably easier to follow!)

There is now a general-audience book on causal inference, and I suppose I'm thankful for that fact alone. But I am still waiting for someone to write the first good general-audience book on causal inference.
134 likes · flag

Sign into Goodreads to see if any of your friends have read The Book of Why.
Sign In »

Reading Progress

August 5, 2018 – Shelved
August 5, 2018 – Shelved as: to-read
August 26, 2018 – Started Reading
September 3, 2018 – Finished Reading

Comments Showing 1-16 of 16 (16 new)

dateDown arrow    newest »

message 1: by L (new)

L M Thank you for this wonderful, clear review


message 2: by Rossdavidh (new)

Rossdavidh Thanks very much for this review. I was intrigued by the book, by its basic premise, but then something about my brief examination of it held me back from buying it. I suppose I will hunt down an affordable copy of Causation, Prediction, and Search instead.


message 3: by junzhe (new) - added it

junzhe You totally misunderstand Pearl's point. He never claims that he addresses the causal diagram finds the casual relation (called causal discovery) from data, but to derive quantitative relationships with the help of little qualitative knowledge provided known as the causal diagram. You may think this method is simple and does not solve any real problem. But the truth is, many scientific inquiries could be modeled as this simple procedure. Even if there is disagreement about the assumptions encoded in the causal diagram, the debate could be limited to just the priori assumed in the diagram itself, instead of the quantitative conclusion derived after.


message 4: by Weipeng (new) - added it

Weipeng Liu Very nice book review! One of the best I've seen this year.


message 5: by Kaz (new) - added it

Kaz Do you have any recommendation's for something better? I really want to learn more about this topic.


Laurent Franckx Thank you for this review. I am completely confused by this book, and I am glad that at least someone with a better mathematical background than mine confirms it is not entirely due to me.


message 7: by Chris (new) - added it

Chris I think this comment taught me most of what I was looking to know in the first place :)


Marko I did enjoy the book as a historical journey but this review did articulate many of the annoyances I had while reading the book.


Lesley Ragsdale I am a complete neophyte in causal inference. I didn't even know there was such a thing as "causal inference" before picking up this book. However, the long explanation you give of precisely what this is is precisely what I got from this book. You just explained it more succinctly than Pearl did, but if your description is the correct one, he had already succeeded in filling my brain with precisely that description himself.

And to that end, I think you miss the point of his declarations that "data are dumb" and I think the reason you protest such declarations is somewhat clear. He isn't saying we should ignore data and produce causal diagrams from the ether completely detached from any observations. He's saying we should explicitly allow for the intermediary of the human brain and its ability in causal reasoning to extrapolate cause/effect relationships from observations, from data. He's saying that it doesn't matter that the exact process by which the human brain does this currently beggars precise scientific definition and thus you must resort to words like "intuition" and "common sense." Data, of themselves, will never tell you anything but quantities and ratios. To make a connection *requires* an intuitive human leap. A causal diagram is just a method of capturing that intuitive leap (informed by data) in something approaching mathematical language. It is the first step in articulating that which has been inarticulate for 50,000ish years: human intuition.

Yes, human intuition is deeply flawed, imprecise, hard to define, and prone to error. It would be great if we could develop AI that could make such leaps of understanding sans so many of our mental shortcomings. But flawed though it is, it doesn't change the fact that as tech currently stands, the human brain remains the best engine for "causal inference" that we know of. It would be silly to not try to explicitly incorporate it into scientific investigation. Protesting this because "common sense is a vague description" is a subspecies of the very mindset that Pearl is debunking in this book.


message 10: by YvesL (new)

YvesL Thank you for this review. I am an applied statistician and was also sadly disappointed by Pearls' book. Like you say, its only positive aspect is that it exists! Thank you for pointing out the book Causation, Prediction and Search by Scheines, Glymour, and Spirtes. I just started reading the introduction; what a difference! So much clarity and precision.


message 11: by J (new) - added it

J C I disagree with some of the characterisations of this review. In particular, Pearl does not claim that the diagram is ipso facto right or should be conjured up apriori. In fact, it is possible to test if a given diagram is observationally and interventionally valid. What he does stress is that at the very least, the otherwise implicit prior assumptions that characterise other data-analysis activities are made explicit in the form of diagrams.


Vince Thank you for this review; clearer and more useful than the book itself. I was reading reviews to see if anyone else was disappointed in it, and reassured to see I'm not alone.


message 14: by Jun (new) - added it

Jun I have a master in ai and still find it quite hard to follow this book. Just like you said, it may be much easier to read the technical literature directly. I think this has failed as a pop science book. I don’t understand why the rating is so high.


message 15: by Lyubomir (new) - added it

Lyubomir Genchev Can you give examples (books/papers) from the technical literature that would do a much better job at explaining the topics covered in the book?
(have masters in math/stat, have not read the book yet)


message 16: by Hugo (new) - rated it 5 stars

Hugo Vendetta Your review makes me wonder if you really read the book as some of your complaints are clearly refuted in the book.

For instance, a researcher’s model can be changed (which you claim the book says it cannot) if its results have inconsistencies with the data.

... so, anyway, maybe you should give it another try without thinking you already know everything.


back to top