Jump to ratings and reviews
Rate this book

The Book of Why: The New Science of Cause and Effect

Rate this book
A Turing Award-winning computer scientist and statistician shows how understanding causality has revolutionized science and will revolutionize artificial intelligence

"Correlation is not causation." This mantra, chanted by scientists for more than a century, has led to a virtual prohibition on causal talk. Today, that taboo is dead. The causal revolution, instigated by Judea Pearl and his colleagues, has cut through a century of confusion and established causality -- the study of cause and effect -- on a firm scientific basis. His work explains how we can know easy things, like whether it was rain or a sprinkler that made a sidewalk wet; and how to answer hard questions, like whether a drug cured an illness. Pearl's work enables us to know not just whether one thing causes another: it lets us explore the world that is and the worlds that could have been. It shows us the essence of human thought and key to artificial intelligence. Anyone who wants to understand either needs The Book of Why .

432 pages, Hardcover

First published January 1, 2018

Loading interface...
Loading interface...

About the author

Judea Pearl

29 books215 followers
Judea Pearl is an Israeli-American computer scientist and philosopher, best known for championing the probabilistic approach to artificial intelligence and the development of Bayesian networks.

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
1,568 (31%)
4 stars
1,871 (38%)
3 stars
1,116 (22%)
2 stars
268 (5%)
1 star
89 (1%)
Displaying 1 - 30 of 594 reviews
Profile Image for nostalgebraist.
Author 3 books426 followers
September 4, 2018
I had high hopes for this book. I've been interested in causal inference for a number of years, and I think it's an field that could drastically improve the practice of statistical science if its techniques became widely adopted. A popular book on the field, written by one of its founders, seemed like an exciting development. Finally, people would be talking about this stuff! It would no longer be just another arcane heterodoxy, invoked by academic gadflies in seminar rooms and on blogs, generating long back-and-forth arguments without changing orthodox practice. The general public would be allowed into the debate, and that would mean a new kind of demand -- for undergraduate courses in causality, for answers to the hard causal questions that currently get smoothed over in academic press releases and, often, in the underlying research itself. Maybe a popular book would change academia in a way that academic work had been unable to.

And maybe all that will still happen. If nothing else, this book has sold well, and it is about an important topic whose very existence is still not widely known. There are awareness-raising functions that such a book can perform no matter what is between its covers, and insofar as it performs those functions, this book should get credit simply for existing. Unfortunately, that is the only credit it deserves. The book itself, the thing between the covers, is a disaster.

There are two distinct ways in which this book is a failure. It fails as a work of popular science writing: it is badly structured, rambling, full of overly detailed and technical discussions of side issues while barely even attempting to explain some of its central concepts. And, separately, it fails as a work of scientific and philosophical thought: its account of causal modeling, if taken at face value, is incoherent. I'll address these two failures in reverse order.

It is a point of conventional wisdom in statistical science that cause and effect can only be investigated through experiment. If you want to know whether X affects Y, and how much, you must actively intervene to give X different values across some group, then observe Y. For example, if Y is a disease and X is a drug that might prevent it, you could assemble a group of people, choose some of them at random and tell them to take the drug (the "treatment group"), tell the rest not to take it (the "control group"), and then wait to see who gets the disease and who doesn't.

According to the conventional wisdom, you can't learn anything about the preventive effect of the drug by simply observing people who take it and people who don't. You have to tell people what to do, if you want to know the drug's effects. Why? Because an individual's choice to take or not take a given drug is a consequence of many factors in that individual's life, and some of those factors could also influence the disease. For example, people with known risk factors for a disease will be more likely to take a drug claiming to protect against it, which means that people who take the drug are more likely to have those risk factors. This will raise the incidence of the disease among takers of the drug, making the drug look less effective than it really is. On the other hand, perhaps this drug is only known to very health-conscious people, and so those who take it are generally in better health to begin with, thus less likely to contract the disease; this would make the drug-taking population less prone to the disease even if the drug does nothing at all. And so on, for any number of hypothetical complicating factors (or "confounders," as they're known in the trade).

This is a fully general point, which holds everywhere, not just in this pharmaceutical example. For any proposed cause X and effect Y, there is the specter of confounders. Everyone agrees about one way of evading this problem: doing an experiment. The conventional wisdom says that this is the only way. It says that "observational data" (about what happens without an experimental intervening) is mute on matters of causation.

The field of causal inference begins, more or less, with the denial of this bit of conventional wisdom. Claims about causation, it says, have implications about what observational data will look like. The nature of these implications is complex and subtle, and it's intellectually a lot harder to work them out than to just do an experiment, provided the experiment is possible (and ethical) to conduct. The observational data at hand is not always sufficient to answer a given causal question; adjudicating between two causal claims can require knowledge of variables you happened not to measure, and in some cases is actually impossible without experiment. But not in every case. Cause and effect leave fingerprints -- albeit partial and obscured fingerprints -- in what we see.

And after all, how could it be otherwise? We can reason about what causes what in everyday life, in spite of the fact that we virtually never have the opportunity to do controlled experiments on our immediate surroundings (much less on our friends or colleagues). If you and I can do it, surely science ought to be able to.

Just to be clear, I'm now describing the field of causal inference in my own words, so as to contrast it with what is written in Pearl's book, which is quite different. Pearl is a giant in this field, so it might seem strange that my account would diverge from his -- where am I getting this stuff, if not from him? Well, I've mostly learned about causal inference from the work of Richard Scheines, Clark Glymour, and Peter Spirtes, especially their wonderful book Causation, Prediction and Search. Maybe Pearl is simply up to something different from S+C+S. If so, I much prefer the S+C+S version. Anyway, here's a bit more of my own account.

I said that "causal claims" have implications for observational data. To make this precise, we need some precise way of representing a "causal claim." This is done via "causal diagrams," which are pictures in which the variables (like X and Y) are connected by arrows. An arrow like X --> Y means "X causes Y," or, more precisely, "X has a causal influence on Y." In the drug example above -- where X was taking the drug and Y was contracting the disease -- we could represent "risk factors" by another variable (say, "Z"), with an arrow Z --> Y (risk factors make the disease more likely) and another arrow Z --> X (people with risk factors are more likely to take the drug).

From such a diagram, you can derive certain facts that must hold in observational data if the diagram's claim is true. For example, if you have arrows X --> Y and Y --> Z, but no arrow X --> Z, this means that X causes Z only through Y and not directly. So if you hold Y constant -- say by grouping the data by the different values of Y, and looking at each group in isolation -- X and Z will no longer be related. This is a testable claim about observational data! A given data set can have this property or not have it (to some degree of statistical confidence), and if the data set doesn't have the property, the diagram must have been wrong.

All of that, you will note, is qualitative/binary. An arrow is either present or absent in a diagram; an implied fact either holds true in the data or doesn't. No continuous shades of grey. But we can go further by attaching numbers (called "coefficients") to the arrows, representing how strong or weak each effect is. These numbers can themselves be estimated from data, and it is only after estimating them that we have a quantitative model telling us which effects are big and which are small.

Now, an arrow X --> Y with a tiny coefficients means that X does influence Y, but only a very small amount. What if the coefficient were actually zero? This means that X doesn't really influence Y after all. In other words, it's equivalent to the arrow not being there at all. This is a really important point, because it unifies the qualitative question ("does a diagram with these arrows fit the data?") and the quantitative question ("what coefficients goes with each arrow?") Any diagram you can draw is just a version of the most general diagram -- the one with every possible arrow -- except with some of the arrow coefficients set to zero. A diagram, then, is a set of claims about which arrows, out of all the possible ones, should have zeros next to them.

If that was confusingly technical, look at it this way. Return to our drug example, and consider the question, "when we account for confounders, does the drug have any effect at all?" This question is asking whether there should be an arrow X --> Y between the drug, X, and the disease, Y. But that's the same as asking whether, in a diagram with that arrow, the arrow's coefficient should be zero or not. Thus, we can start with a diagram that includes the arrow -- as if we're assuming the drug does have some effect -- and then proceed to estimate the coefficient. If it turns out to be zero, we can safely delete the arrow. Thus, when we draw a diagram, we are not assuming all the arrows it in represent real effects. All of the assumptions are in the arrows not drawn: these effects are assumed to be absent, and all other calculations are done against this background of assumptions.

Enough from me. How does The Book of Why tell it?

Well, uh . . . confusingly. One of the book's central claims, re-asserted again and again, is that data on their own are "dumb," and cannot be causally interpreted until one draws a diagram, representing a set of assumptions about the reality behind the data. In Pearl's view, the assumptions in a causal diagram live in some separate, a priori realm, completely distinct from "data." He never tells us where the diagrams are supposed to come from, if not from empirical observations about the world; the book contains scattered references to "background knowledge" or "common sense," but these never coalesce into a general statement about what sort of information is allowed to inform our choice of diagram. We are only told that whatever this information is, it must not be "data." (Whatever that means!)

So, in Pearl's version of causal inference, you must first choose a diagram before you see the observational data at all. You are (apparently?) not allowed to change this diagram once you see the data, since diagrams do not come from data. You can do only one thing from the data, and that is estimating the arrow coefficients. (I am gliding over some complications about linear vs. nonlinear models here; in the latter you might estimate a more complex object for each arrow.) Although he never says as much, this is the entire subject of Pearl's book: no more and no less than estimating the coefficients for a pre-specified, fixed diagram.

This makes the book mightily hard to follow if you come in expecting an account of how to answer "why" questions. For example, the book spends a lot of time on the topic of smoking and lung cancer, a subject of vigorous debate in the mid 20th century. The observational data were clear: smokers got lung cancer way more often. But, just as in the drug example above, this was not conclusive evidence that smoking caused lung cancer, since there is the possibility of confounding. What if (say) there is some gene that makes people want to smoke more, and also causes lung cancer? Then smokers would get lung cancer more often, but it would not be the fault of their smoking, and quitting would not save them.

Just as in the drug example, we can draw a diagram with arrows for both proposed mechanisms. We have X (smoking) and Y (lung cancer), with an arrow X --> Y. And then we have Z (the gene), with an arrow Z --> X (the gene makes you smoke) and Z --> Y (the gene causes cancer). Pearl draws this exact diagram quite a few times, to describe the smoking / cancer scenario as well as various others.

So how does Pearl propose to answer the question, "does smoking cause lung cancer?" Well, he runs us through the technical machinery involved in estimating the coefficient for the arrow X --> Y, even in the presence of possible confounding. (He does this in bits and pieces, commingled with random anecdotes and digressions and weird spurts of overly detailed technicalities, but it's all in there, ultimately.) Now, as I understand it, the question "does smoking cause lung cancer?" is equivalent to the question "is the X --> Y coefficient nonzero?" So estimating this coefficient is a fine thing to do. But, as I said earlier, this only makes sense with the understanding that the arrows present in the diagram are concessions that an effect might be present, not assertions that it is -- and that the real assumptions lie in the arrows, and variables, we exclude from our diagram.

Pearl never clarifies this. As he tells it, everything in a diagram is an a priori assumption, existing in a separate realm that cannot be touched by mere data. If this is true, then I don't see what there is to stop someone from just drawing a diagram with no arrow from X --> Y at all, and claiming that by their causal analysis, smoking does not cause lung cancer. Of course, this is absurd. They've assumed the conclusion they were trying to prove. But how would Pearl argue against them? He can't say "your diagram is empirically wrong, because it has implications that are not true of the data." All he can do is estimate the coefficients for the two arrows that remain, and say, "well, given your diagram, this is how much Z affects X and Y." He can't tell them they've omitted an arrow that should be there according to the data, because for him the arrows come from your mind, not the data.

To be clear, I don't think Pearl actually disagrees with me here. Given this actual question, he would correctly answer that the diagram without the arrow implies such-and-such conditional independence results, and point to the flagrant violation of those results in observed reality. But this is inconsistent with the framework which he states again and again in the book.

If you try to take the book literally, things get weird. Sometimes Pearl admits -- or claims proudly, as if it's a testament to the power of causal inference -- that data can confirm or deny the presence of an arrow. But then he goes on to draw all sorts of diagrams without any reference to data, assuring the reader that he can draw whatever arrows he pleases, and the "dumb" data can't stop him. So we are adrift in this strange world where one makes certain causal assumptions (encoded in a diagram) for the sake of assessing others, never sure whether any given arrow is an inviolable assumption or a testable hypothesis, or what makes the difference.

So, that's what I meant when I said the book was a failure "as a work of scientific and philosophical thought." What about my other assertion, that it's a failure "as a work of popular science writing"?

First of all, Pearl and his co-author make the disastrous choice to organize the book chronologically, and present it as a history of causal inference. The problem here is that most of the historical narrative involves the discipline of statistics fumbling around and failing to deal properly with causation.

I agree with Pearl in his negative assessments of these past efforts, but the chronological organization means we must suffer through >100pp of lamentations that past researchers did not have the benefits of Pearl's methodology before we reach the part of the story where Pearl's methodology is actually explained. We are told many times that so-and-so failed because they did not have access to something called the "do-calculus," long before we are informed what the do-calculus actually is. The reader would have been much better served by a book that explained Pearl's entire approach at the outset, in one or two self-contained chapters, and only then went on to show how badly things went before its invention.

Second, even when the explanations do come, they are badly botched. The book seems to be written on the principle that if you include enough examples, funny anecdotes, cute cartoon illustrations, etc., you have written a popular science book, even if you never quite state the actual science in a way the general reader can understand. The book is decent at explaining things that are easy to explain (although not always -- the early account of regression is shockingly bad), but when Pearl tries to describe the real meat of his intellectual contributions, he usually just throws a few undigested equations at the reader, surrounds them with some riffs about his academic colleagues, and calls it a day.

I'm a data scientist with a doctorate in applied math and some prior reading in causal inference, and I still found it hard to understand many of the weird, fragmentary, notation-heavy "explanations" in this book; I have no idea what the general reader will make of them. If you don't know how to read conditional probability expressions involving nested sums over dummy variables with unspecified limits, some key points in this book will look like gibberish to you. (And if you do, you really ought to be reading the technical literature, which can be considerably easier to follow!)

There is now a general-audience book on causal inference, and I suppose I'm thankful for that fact alone. But I am still waiting for someone to write the first good general-audience book on causal inference.
Profile Image for Andrew Harlan.
1 review3 followers
August 30, 2019
Failed revolution

In an old joke, an engineer, a physicist and an economist are marooned on a desert island with canned food. They are trying to figure out the best way to open the cans, and while the engineer and the physicist propose various mechanical schemes to get the job done, the economist says, "Let's assume we have a can opener..." Judea Pearl's approach to causal inference brings that joke to mind. His causal calculus begins with the premise, "Let's assume we have a strong causal theory." He shows that once you know the causal relations (or lack thereof) between relevant variables, you can use graphical methods to work out how to estimate the values of the parameters you are interested in. But this puts the cart before the horse.

In The Book of Why and many earlier publications, Pearl promotes his extension of probability calculus and nonparametric structural equation models (directed acyclic graphs or DAGs) as the solution to the problem of inferring causes from observational data. He describes his approach with such terms as "revolutionary" and "miraculous." I have personally found traditional path or SEM models useful when thinking about some causal problem, but I see little new that is worthwhile in Pearl's approach. Graphical models can be useful tools in research, but there's nothing revolutionary about them; they are not a royal road to causality. If you have good subject knowledge about a research topic, you will have an understanding of the dependencies between relevant variables and you can use graphical methods as one of the tools to clarify the implications of the model, but that's really it.

What is missing from Pearl's book is very telling. Almost all the "practical" examples in the book concern either problems that were long ago solved with methods other than Pearl's (e.g. smoking causing lung cancer) or else are toy problems where the underlying causal model is presumed known. Pearl triumphantly shows how his approach "solves" these non-problems. He delights in how he can routinize the procedures for solving various inference problems when the causal structure is known with certainty. I must say that I find his explanations of well-known paradoxes less illuminating than more traditional treatments. For example, tabular data on the distributions of values for different groups usually makes Simpson's paradox entirely transparent, obviating the need for a more formal treatment. More importantly, Pearl's methods do nothing to help discover whether your data are actually confounded by something like Simpson's paradox.

A prominent early airing of Pearl's ideas about causality took place in the journal Biometrika in 1995. Pearl's target paper published in the journal contained all the main elements of his current framework. The paper was accompanied by a number of expert commentaries, and a common theme in many of them was, aside from polite comments about the technical elegance of the approach, a skepticism that a method like this could help achieve concrete, real-life advances in scientific understanding. The 1995 article and the 2018 book are separated by 23 years, and in the meantime Pearl has published two editions of his textbook on causal inference, but there is still a total paucity of real-world benefits stemming from Pearl's program. The implications of this state of affairs for Pearl's "causal revolution" are devastating, but he seems to be blind to the enormous gap between his pompous pronouncements and the reality.

If Pearl's claims about the revolutionary impact of his theory were correct, we would now be living in a golden age of science. In reality, however, there is currently a crisis of confidence in science, across many fields. Not only has the increasing utilization of Pearl's approach not done anything to prevent the crisis. It's also that Pearl completely failed to anticipate the actual problems that many fields of research are facing. No amount of DAGs will solve such problems as small sample sizes, selective reporting of analyses, lack of replication, weak theories and poor measurement.

Pearl assumes that researchers can make use of a reliable body of "background knowledge" when drawing their causal diagrams. If the relevant variables and their causal relations are presumed known, the estimation of causal parameters becomes straightforward. This shows an undue optimism on his part. The fact is that, as has become ever more obvious in recent years, many fields of research not only lack convincing theories to make sense of data and experiments, but there is also pervasive uncertainty about the basic reliability of large swaths of observational and experimental data reported in the published literature. All scientific knowledge ultimately relies on at least some uncertain assumptions, but that does not mean that you can freely make any old assumption when making causal inferences. If the "background knowledge" you have is just a muddle of unconnected, incompatible and unreplicated findings and hypotheses, Pearl's method is not useful.

You might protest here that even if Pearl's methods are impractical in immature fields, that's not the case when the background knowledge available is more solid. However, Pearl does not even try to argue that his methods are needed in the mature, physical sciences. Researchers in mature fields already know how to make causal inferences and have no need for Pearl's insights. If you have a strong theory and good data, it is usually the case that a simple regression or the like will give you the causal estimates you want. In such cases, nothing is gained by re-expressing a problem and its solution in terms of Pearl's formalisms. If you have a strong theory, Pearl's methods offer nothing to you, whereas if your theory is weak, as is by and large the case in social science, DAGs will only give a false causal veneer to associations whose nature is unclear.

Therefore, Pearl's "revolution" amounts to introducing canned procedures for dealing with some relatively trivial subproblems that a research program may contain and that a mature discipline knows how to deal with anyway. It does nothing to help tackle the hard problems of research, so it is unsurprising that the (sporadic) adoption of these methods has not coincided with any scientific progress.

Pearl's program is a failed attempt to replace the messy, trial-and-error process of scientific discovery with clean mathematical and logical formalisms. It is an overreach of rationalism by a mathematically inclined theoretician with little experience dealing with the jumbled reality of the "soft" sciences.

With all that said, I nevertheless think The Book of Why is a decent read. In particular, the historical chapters are interesting (even if sometimes tendentious) and a non-expert reader will get the gist of the method from the many examples (even if some of the exposition is overly technical). Just don't expect to find anything revolutionary in it.
Profile Image for Manny.
Author 29 books13.6k followers
May 23, 2021
Well, I am not an expert on statistics, so maybe I'm missing something important, but I really don't understand all the negative criticism that I see in other reviews of this book. Pearl, who has spent a long career working in an area which spans statistical reasoning, philosophy and AI, set himself an extremely ambitious goal: he wanted to establish a clear, logically consistent foundation for the notions of causality ("A makes B happen") and counterfactuals ("B would have happened if A had happened"). As he says, both statisticians and philosophers had been deeply mistrustful of both concepts, preferring only to talk about associations.

Pearl gives coherent reasons to believe that that this is overcautious. Human language is packed full of causality and counterfactuals: it's the fundamental substrate of our common worldview, we can't do without it. To take just one of many flagrant examples, it's impossible to make sense of fundamental legal and moral concepts like "responsibility" or "guilt" without using this language. If the prosecutor wants to convince the court that X is guilty of murdering Y, he needs to demonstrate that Y would have been alive had it not been for X's actions. To say that this is philosophically or mathematically inadmissible is to deny the validity of the entire field of legal reasoning. If you're familiar with the philosophical tradition, your knee-jerk response at this point may be to object "but what about Hume?". Pearl looks at what Hume actually says on the subject, and points out that his revised definition of causality is not just phrased in terms of associations. He also realised that he needed to add counterfactuals.

At least on his own account, Pearl and his students appear to have made a great deal of progress in attacking these thorny problems. They have developed a way of thinking about them where the central construct is a "causal diagram", a graph where different factors are connected by arrows representing hypothesized causal links. Causal diagrams are a good match to people's intuitions about causality, and Pearl gives many examples showing how they support different kinds of reasoning. Some of this reasoning is obvious, some of it is very subtle; some of it becomes obvious only after looking at the diagrams. For example, a pattern which comes up many times in different forms is so-called "collider bias", where two causal arrows meet at the same point: if you condition on the joining concept, you'll create a spurious association. Pearl gives a cute illustration from the world of dating, where the folk wisdom is that the good-looking dates tend to be jerks. His explanation is as follows. Being good-looking and having a pleasant personality are both features that make someone more attractive. It is reasonable to suppose that these two things may not actually be correlated. But if your sample is drawn from the people you've dated, you're conditioning on the "attractiveness" variable: you're only looking at people who were attractive enough that you dated them. This creates a spurious negative association between "good-looking" and "pleasant personality". So if someone is good-looking, they are more likely to have an unpleasant personality. "Collider bias" is very simple compared to some of the things covered in the book. Particularly impressive items are the "do-calculus" (an axiomatic framework for estimating the effect of performing an intervention), and a set of formulas for measuring direct and indirect effects when one factor operates on another through an intermediary; for example, smoking causing cancer through the intermediary of tar. Pearl describes the reasoning that led him to these ideas, where in many cases a deceptively simple formula is the product of several years of careful thought.

Well, it's possible that I'm a sucker who's been taken in by good marketing. But Pearl has an excellent reputation: he's published hundreds of widely cited papers and picked up just about every award going. To me, he looks like the real deal. I think I need to read his 2009 book Causality and download a causal inference package.
Profile Image for Terran M.
78 reviews93 followers
January 17, 2023
I've never met Pearl, but having read a couple of his books, I'm pretty sure he's an asshole. His anger and bitterness comes through very clearly in his book — he spends as much space naming and vilifying his professional enemies, both living and dead, as he does explaining his work. This is a real shame, because his work is actually quite good and deserves a popular presentation; sadly the sanctimony in this book is almost unbearable, and there is no humor to lighten it.

I think your best bet is to read chapters 1,4, and 6, and keep a bottle of antacid handy. Alternatively, I would recommend Structural Equation Modeling by Kline.
Profile Image for Ryan Sloan.
29 reviews4 followers
January 2, 2019
There are great ideas in this book. I'm not an expert on causality or statistics, but I found the idea of modeling causality using a directed graph, and using that graph as a tool for both a) determining valid controls in experimental data and b) performing counterfactual reasoning to be thoughtful and (probably) useful. I have little doubt that Pearl's contributions to the sciences will prove to be important and useful. But I have a hard time endorsing this book. I recognize I'm in the minority on this, and your mileage may vary.

Popular science writing is hard. Balancing the seeming impenetrability of a technical subject with the sin of oversimplification is never straightforward. This is why I have a huge amount of respect for great science writers, and why I feel that they often do a better job writing these books than the experts themselves. When I saw that a renowned scientist team up with a science writer on this book, I was very excited. Unfortunately, I feel this book is meandering, poorly justified, and shallow in some important areas. It's in dire need of a more aggressive editor, and a more reader-focused (or outspoken) science writer. The tone throughout suggests this was primarily Pearl's work, with Mackenzie along for the ride.

Rather than organizing this book around big ideas, they meander through the history of statistics with a tone verging on vindictive. The reader is likely to come away with a not-so-rosy impression of Pearl as a self-aggrandizing person with an axe to grind. Pearl devotes countless pages to rambling diatribes about statisticians with whom he disagrees, and continuously suggests that had they only used his championed theory of causality, they wouldn't have been so dumb. It's kind of unseemly how much energy he pours into these rants about how they could've gotten it right. I get it. But I think this can be done in a way that is tasteful and honors the contributions of those who came before (in fact, I just read Thaler's Misbehaving which is a great example of doing this well.)

Which brings me to my next point, which is that, as far as I can recall, nearly every application of causal diagrams that Pearl mentions is in the form of post-hoc rationalization. I was shocked at how few instances there were of these causal diagrams actually being used to solve problems, instead the book was dominated by "if only"s. If only they'd used them when analyzing whether cigarettes caused cancer, if only they'd used them when analyzing the relationship between birth weight and birth defects, if only if only if only.

It leaves me in a very contradictory state of mind: on the one hand, I must design the causal diagram a priori before using the data to draw conclusions; on the other hand, Pearl seems to greatly reap the rewards of hindsight.

Most of the problems that Pearl works through are toy problems, designed to illustrate principles. There is nothing wrong with that (I am a huge believer in worked examples), but he rarely works through them in depth! I am reminded of so many (bad) math textbooks which overused the "proof is left as an exercise for the reader" trope. For example, in Chapter 7 (page 239-240) Pearl writes the following:

Yet another reason that the do-calculus remains important is transparency. As I wrote this chapter, Bareinboim (now a professor at Purdue) sent me a new puzzle: a diagram with just four observed variables, X, Y, Z, and W, and two unobservable variables U1 and U2. He challenged me to figure out if the effect of X on Y was estimable. There was no way to block the backdoor paths, and no front-door condition. I tried all my favorite shortcuts and my otherwise trustworthy intuitive arguments, both pro and con, and I couldn't see how to do it. I could not find a way out of the maze. But as soon as Bareinboim whispered to me, "try the do-calculus," the answer came shining through like a baby's smile. Every step was clear and meaningful. This is now the simplest model known to us in which the causal effect needs to be estimated by a method that goes beyond the front- and back-door adjustments.

Reading this glowing paragraph, I couldn't wait to see such an elegant, "baby's smile" of a proof. Are you ready for it? Well I don't know it because they never shared it. Pearl just moves on. All that hype and nada. I actually wrote "are you kidding me?" in the margin of my book. It's full of stuff like this - flowery prose with a lack of concrete examples. I did find that there was a good worked example for counterfactuals beginning on page 273 (the "what would Alice's salary be if she had a college degree?" example). The lack of examples would be fine if concepts were well-explained, but to be honest I was unimpressed with his explanation of even simpler statistical concepts. Heck, I think the reader could get through all 370 pages without actually learning what the back-door criterion is (he sort of defines back-door adjustment on page 220, if you're looking.)
Profile Image for Alex Telfar.
106 reviews87 followers
August 13, 2018
I enjoyed this book! It did everything a good book should do, it provides; understandable examples, entertaining side-notes, applications to the real world, something useful that is novel/little known. 

The book could have been better (5 stars) if it was more concise, explained the general algorithms for; mediation analysis, independence testing, transfer, ... explained the relationship of causal inference to calculus and spent less time on its whig history and adversarial narrative.

I think Judea's main point was: Correlation does not imply causation, unless you can control the confounders (for this you need a causal model). This means you can make causal inferences from data, but you just need to make some assumptions. 

Currently I feel like I understand the motivation and potential power of causal inference, but I do not understand the details. But thanks to this book, I now have the motivation to stare at the math for as long as it takes, or a few weeks.

For some more thoughts see my blog.
Profile Image for Gary  Beauregard Bottomley.
977 reviews580 followers
January 16, 2019
There were some real flaws with this book that bothered me to no end. I had no problem following his statistical examples and how to think about data analysis in the way the author suggests we all should. I even enjoyed it when the author connected what he called Smart Artificial Intelligence to his overall causal theory, and I enjoyed the book when he alluded in passing to the importance of solving the P=NP problem and how that would relate to what most people call super AI .

If we can model completely the effect of a variable, we can get beyond the functional representation and understand the cause, that is, the why, the intentional or the intuition lying behind the variable of interest. Or in other words, as the author wants to say correlation can show causation if a perfect model is in place for the variable under consideration. The author is absolutely right, but perfect unique models of the real world don’t exist.

The more we understand about the process, the interactions, the confounding and the mediated variables the better our model will be and every statistician tries to do just that with every dataset they come across. All statisticians want to understand the process, but in reality often they have to let the data lead the way.

Good data analysts (statisticians) know that if they control for mediated variable that collude with resultant variables they risk confounding. While everyone might not understand what those words mean in a strict statistical sense, I would think that everyone who thinks about the world through the lens of modeling and data knows confounding can be lurking. The author is not really telling us anything that most people didn’t already know, at least among people who analyze the world with data.

Science never proves anything. We say things are a fact, but when we say something is a ‘fact’ we really imply ‘scientific fact’. The world of facts is always ‘underdetermined’. That means that the facts we have can always be explained by multiple theories. The author is talking about data, the facts that make up the world. He wants to show that correlation can show causation when we see beyond the data and wants us to consider graphical analysis and use his ‘do calculus’ and causal path analysis as tools for developing a model. (I owe a slight elaboration to why I say ‘science never proves anything’, at least that is the standard paradigm that science uses with its null hypothesis and alternative hypothesis. We reject the null and accept the alternative at the level of six sigma in quantum physics or 2 sigma in psychology. That is what scientists do and yes it does come from R. A. Fisher who the author mentions multiple times mostly in order to criticize).

The author is right to say that a statistician needs to understand the processes and the underlying mechanisms at play within the issue under study, but every statistician already knows that. The more the analyst understands about the process the stronger the statement can be made. ‘Climate change is a (scientific) fact’. We can say that not just because of the data, but because of the well understood climate models that have been fine tuned over time that mirrors our data expectations post-prediction and retro-diction (you know, Einstein’s heart skipped a beat when he saw that the perihelion of Venus fit his General Theory even though that was retro-diction, after the fact).

The statistician will perform sensitivity analysis, model fitting, Bayesian analysis (taking prior information and using that to get a result and weighing those results by expectations), graphing, identifying the mediated variables, perform ceteris Paribas, contra-factual analysis or in general anything that reasonably needs to be done in order to confirm or deny their best alternative hypothesis and all of those techniques are prominently featured in this book.

The author likes Harari’s first book “Sapiens” because it fits the story he is telling. He quotes Harari to the effect that humans are different because we are the only creature who knowingly believe a fiction. The author doesn’t quote Harari’s second book ‘Homo Deus’. In that book Harari will say that big data analysis will understand our causes from the data itself, the ‘whys’ of whom we are, or in other words our intentional state beyond our functional state, or in other words our intuitions beyond the action itself. The author thinks more than data itself is a requirement, Harari, at least in his second book, will say big data itself will be sufficient with the right self adjusting programming.

I don’t dislike this book at all. I liked how the author does talk intelligently about super AI on the peripheral; I like that the author has reasonable ways to think about complex problems. I felt that most analysts would understand the points the author was making, and I felt the author ignored too much of the Philosophy of Science in his presentation, and I think most people already know that data analysis and process analysis always must go hand in hand, but contrary to what the author is saying I think that sometimes one must let the data do the explaining for the analyst, and we must always remember that certainty is always illusive in the real world.
Profile Image for Athan Tolis.
309 reviews580 followers
October 9, 2018
My son George’s first language is Japanese.

His first annoying habit, which raised its head very soon after he was granted the gift of speech, was to answer every request / question / casual comment with “doshte?”

“Doshte,” you guessed it, is Japanese for “why?”

This, Judea Pearl argues very persuasively in this book, is –for the time being-- the biggest difference between thinking men and thinking machines.

I LOVED this book. Loved it, loved it, loved it.

You can read “The Book of Why?” as a popular science book. So I started reading it that way and, some thirty pages in, I thought to myself “very weird, I have not lost this guy yet!” You could not really say that about “Brief History of Time,” could you? The funny thing is, I eventually I made it all the way to page 370 (the last page) and I was still with the author! For me, that’s a first: a popular mathematics book that carries on introducing new (and I mean NEW) material, concepts that were not understood when I went to college, but regardless explains it clearly enough that I could carry on learning the whole way through.

To briefly summarize, the author explains that sometime a hundred years ago meaning in statistics was sacrificed at the altar of rigor: because the giants who defined the space were not personally comfortable putting a definition on the word “why,” they not only repurposed the entire field to answer “when” (thereby throwing the baby out with the bathwater), but also rendered it heretical to examine causation. In particular, the practice of identifying probability-altering interventions was proscribed by the mathematical mainstream.

To get the ball rolling, the author stakes his claim early on in the book and defines the “do” operator. (p.48) For example, if we know for fact that getting rid of the weeds (which we call action do(X)) results in a better crop Y, we can go ahead and write:

P(Y/do(X)) > P(Y)

This was deemed to be heresy because there was no clean mathematical meaning for “do” and no set of operations / conclusions that could derive from it. Cauchy was no longer around, I suppose. The orthodoxy was established that we only need care about association. From the full list of associations logical people were free to draw their own conclusions regarding causation. If cutting the weeds and a better crop are correlated, nobody is going to accuse you of making stuff up if you conclude one caused the other.

The benefit of the canonical approach, and it is an enduring benefit we should not disparage, is that, with minimal knowledge of mathematics, you can use a statistical package that slices and dices the “whens” and gives you a slew of pre-packaged answers: “sons of 72-inch-tall fathers will on average be 71 inches tall, but sons of 68.5-inch-tall fathers will on average be 68.5 inches tall.” My George is therefore predicted to be 68.5 inches tall, and I’m the average height of a male British subject in 1877, more disturbingly! Ah, but on the plus side, that probably also means George will be taller than me, because the mean has no doubt shifted up…

Cool, but we can do better. A lot better!

Judea Pearl goes into enormous pains to give maximum credit to all his students / disciples, but it was he who singlehandedly forced mankind up a construct that he calls “the ladder of causation.” Here he invites you along!

First, he takes you one step up from “association” to “intervention.” To do so, you need to start drawing pictures. Graphs (causal diagrams, they’re called) that allow you to point from causes to effects. These charts need not be handed down by a higher being. You can sketch your own, you can test the conclusions versus the data and you can change your mind and draw them again.

These charts, once you’ve drawn them, naturally force you to observe three important types of nodes / factors: “mediators” (example: tar in your lungs mediates between your smoking and you getting lung cancer), “confounders” (example: a now-identified “smoking gene,” rs16969968, both makes people likelier to smoke and makes them more susceptible to lung cancer, but clearly does not deposit tar in their lungs) and “colliders”(example: smoking and birth defects can both affect birth weight). The author goes on to explain what the “front-door path” and the “back-door path” is from potential cause X to potential result Y. (In a later chapter he expands this repertoire to “instrumental variables.”)

Next (some 200+ pages deep into the book) comes the math, which is the first time you’re asked to actually believe the author, rather than find yourself invited to discover alongside him. And here’s what the math says:

suppose you’ve drawn your causal diagrams;
suppose you’ve expressed them in mathematical expressions, using the “do” operator;
then there are exactly three “legitimate transformations” you can apply to these equations that correspond to the diagrams in order to convert them into testable (or otherwise!) run-of-the-mill probabilistic statements of the kind a conventional statistician can abide:

1. If W is irrelevant to Y, then
P(Y / do(X), Z, W) = P(Y / do(X),Z)

2. If a set of variables Z blocks back-door paths from X to Y, then
P(Y / do(X), Z) = P((Y / X, Z)

3. If there are no causal paths from X to Y, then
P(Y / do(X)) = P((Y)

Not only that, but there are no other necessary rules. If there’s a way to convert your causal diagrams into classical probability statements, then there’s a way to do it with these three tricks.

These are, in short, the three rules of “do calculus” and they allow you to test your intuition regarding causation. You can put them to two separate uses:

1. You can now design better experiments
2. You can look at already existing data better, resolving a large number of “paradoxes”

A “worked example” is provided on page 236, that takes you in six simple steps from

P(c / do(s)) = Sum over t [P(c / do(s),t) P(t / do(s))]

to the testable:
P(c / do(s)) = Sum over s’ [Sum over t [P(c / t, s’)P(s’)P(t / s)]]

The author next works his way through a couple of these paradoxes that the new method cuts into shreds: Berkson’s Paradox (smokers in a 1995 thyroid disease study have a higher survival rate than non-smokers), Simpson’s Paradox (most departments at Berkeley favor women in admissions, but women overall have a lower chance of getting into Berkeley than men) all fall under the weight of his new weapon.

(With that said, I still like my explanation more about why you should change the door in the famous TV game: (i) the chance you picked well to start with is 1/3, (ii) if you didn’t pick well you’re guaranteed to win when you change door! The author goes over the blah blah regarding how the game host show imparted one part of the decision tree with extra info…)

Now I’ve established I’m in awe of the author, and while I’m being a smart-alec, I’ll point out the one issue I have a problem with:

Judea Pearl protests too much about his predecessors’ notion that “it’s all in the data.” Yes, his tools help you design better experiments. Away from that fact, however, (and yeah, that’s pretty major and would in itself be enough of a contribution to mankind) this new calculus of causation in the end amounts to a set of new “goggles” we can wear to look at data better. To my taste, then, he complains a bit too much. To a great extent, it IS all in the data, it’s just that thanks to him we now know better where to look.

(note to the reader: this may be the correct place to tell me I’ve understood nothing)

The astounding thing is that this is only the first step we’re invited to climb alongside the author on the “ladder of causation.” And so it is that you climb one more step, from “intervention” to “counterfactuals.”

This is, finally, the “why” step that lends its name to the book.

Example: when an angry coach tells a player he should have passed the ball to a teammate rather than try to dribble the goalie, the player knows why: his teammate would have scored! That is the counterfactual! It is the state of the world that did not come to be, but against which his actions have been judged.

Believe it or not, a second calculus has been invented by the author and his associates in the space of the past couple decades, with the explicit purpose of putting some mathematical meat on the bones of this syllogism.

The main problem solved is the one where a fire and a blocked fire escape combine to cause somebody’s death. The combination of the factors guarantees the outcome, but how bad you feel about the blocked fire escape depends on your estimate of what the chances of death would have been had the fire escape not been blocked.

Needless to say, this is a simplified example and the calculus helps you deal with continuous outcomes, not only binary outcomes.

The author defines three quantities: total effects, Net Direct Effects and Net Indirect Effects.

Let us say an extra year of education leads to higher salary through two paths, one because people pay up for better-educated people and one because the stuff you’ve learnt may help you perform better.

The author defines two quantities:

The Net Direct Effect of a year of education is how much more you will be paid if you skip the studying and go to the Bahamas, a friend sits the test in your lieu, you come back exactly as skilled and motivated as you left, but nobody finds out and as a result your employer pays up just because you got the degree.

The Net Indirect Effect is how much more than your pre-degree self people get paid who never got the degree but somehow have the same skills as you will have post degree.


“The total effect of an extra year of education is equal to the Net Direct Effect of an extra year of education MINUS the Net Indirect Effect of SKIPPING a year of education”

(not plus the Net Direct Effect of having it)

This equation is applied to the problem of the smoking gene and cancer and demolishes the excuses of anybody who has the infamous gene: they’d better quit smoking, bottom line, and the rest is talk!

Which lands you safely on chapter 10, the last chapter of the book, the one regarding artificial intelligence: can we teach a computer morals and should we do so?

It is, comfortably, the best chapter in the book!

Armed with the tools you’ve just mastered, you have no problem following the author’s argument: if we can teach a machine to think like a child and consider the consequences of taking or not taking actions and if we additionally give it license to test (again, like a child) the consequences of its actions, then we can answer our questions with a "yes."

For a machine that is equipped to ask "why" is a machine that we can count on to do the right thing and act as our moral compass.

Profile Image for Nilesh Jasani.
987 reviews135 followers
June 11, 2018
Here is an excellent book by a renowned expert but potentially with deep fundamental flaws and conclusions. The reviewer is more likely mistaken in these views given that the author is clearly a master thinker on the subject - a point worth noting for any soul wading through this long review. Much of this review is hardly a book review as it is more about the argument gaps this reviewer sees in the author’s epistemological framework rather than the book itself.

Before my disagreements with some of the basic premises mentioned in the book, a few words on the book itself. The author covers a lot of important topics which are barely ever mentioned in popular books.Through confounders, colliders, mediators, the author passionately and patiently explains the nuances of good statistical analysis. The author is a renowned innovator himself: his pioneering ideas have helped many fields because of the work explained here. The book does become too technical at times for non-experts but that only means there is so much more to learn from the text.The arguments below do not mean anything against the details of the causal analysis discussed by the author in the book.

The author makes the following basic points (my words): to believe in any statistical relationship, let’s call it an association or a correlation, there has to be some causal link underlying the two. Determining the exact nature of the causality is the first step before one is able to determine the course of further analysis. Otherwise, the drivers of the associations could be external or spurious, leading to other investigations and analysis. And only human intelligence can unearth these causal links/intentions.

Let’s take these arguments slowly to see the problem with the supremacy accorded to human intelligence and consigning machines to be dumb now and forever. For the book on “why”, the most critical why was never really asked: why this belief that human intelligence has to be the supreme, if not the only one, to understand causality.

Causation is critical to determine - a point well made repeatedly in the book. The author elaborates on many historic and philosophical discussions to conclude why this point is lost on many statisticians and data scientists. Is it?

The author’s criticisms are in two parts although again not his words:

- causality was historically mistreated
- causality cannot be factored in by machine learning, big data systems that undermine human analysis role.

On the first point, there is a lot of straw-man analysis in the early part of book. In the classes where this reviewer learned his statistics in early 1990s and textbooks of 1980s or before, causality was reigning supreme. In basic statistics, any correlations obtained without a good underlying hypothesis explaining the why needed to be binned as a spurious data mined result. The author’s “Do” function is a new tool but in a lot of ways it always existed in the era where the first step before any statistical analysis start was to explain what one was trying to prove through potential causation. Causation was used to strip the data to small number of variables that could be tested in the world without enormous computing power. There must have been dozens of statisticians, as the author avers, who fought against causality in data analysis, and path diagrams etc are new useful tools because of the deeper appreciation of the need of causality, but this reviewer never got the impression in his learning that one could analyse without hypothesis formation and causal arguments before the arrival of Big Data.

One can make this argument more differently beyond or before statistics. Until the recent decades, if not years, human intuition was used to try and explain all sort of dependencies in the world around. Intuition based causality was not only used in theological texts from the days of Adam but even scientific theories were required to pass the tests of intuitive explanations rather than data based dumb equations. Counterfactuals, thought experiments and barebones data were used to create theories as strong as natural evolution and relativity (yes, there was surely a lot of data supporting them when they were discovered, but intuitive/inspirational logic played a huge role in their formation). And, anything that failed the human language explanation - example quantum physics - found a large number of critics over almost a century.

The main first points as shown by our experiences are

a. Can there be results without hypothesis or causality intuited or explicable to humans?

b. Can data generate hypothesis?” Better than humans?

c. And what is the cost of wrong hypothesis formed by humans versus no hypothesis?

The answer to the first question is a resounding yes. In a way the question should be whether causality is always (versus only - a point explored later) expressible in human language. There is definitely a lot in this world whose explanations do lie within the bounds of what is intelligible in our languages - the famous current examples are quantum physics or working of the DNA at the most fundamental level. The author gives another example - the working of some of the latest AI machines including AlphaGo. With abundance of data and computing power, we have to be ready for times when there will be a lot that will benefit us, run the world around us, but whose complete details we will not understand in our own language. This is a scary prospect, without a doubt, but also inevitable. We must also remember that there are causal explanations we simply cannot express in our common languages including in math that is legible to us.

Before we discuss men versus machines, it is perhaps the time to understand what causality means! The author rightly balks at defining Causality (any reductive explanation takes away something from the true meaning, as per the author). This reviewer would still go ahead with the following working definition to explain some of the issues with the conclusions on machines reached by the author: causality - the way this reviewer sees it explained in the book - is a priori, human intuited and human language explained possible relationship (a hypothesis) between independent and dependent variables with each clearly defined.

It is true that human intelligence, with the ability to draw intuitive potential relationships, is good at inferring many causal connections, this intelligence is far more fallible - as our religious and political history from time immemorial is witness - than otherwise. Our intelligence is neither objective nor prone to agreements at the end of every path diagram and statistical result sets. Most importantly for the book of Why should be causality underlying causality: there must be something in human intuition or intelligence that must be making us somewhat good at detecting causality. If that’s the case, why can’t we make a better causality detecting machines?

The author makes “machine learning” far more dumber than it is in 2018. There are no specific restrictions what machine learning constitutes: machines can easily be modified, and indeed they are, to test causal hypothesis by running controlled experiments (intervention) for instance. Machines could be programmed to analyse counterfactuals or even test thousands of different path diagrams and causal paths to detect those that make the most sense. In a world where the number of variables analysed are no longer just a handful, it is inconceivable for human intelligence to draw a causal path the traditional way shown in the book except in extremely simplistic situations.

One critical concept the author misses is that of starting with wrong hypothesis or potential causality. In some cases, like in textbooks, this is not a worry as a wrong hypothesis would be proven wrong with data thus adding to our knowledge. In real life statistics, data itself is neither complete nor representative in most cases. What one picks and analyses is often a function of what one is set out to prove. Numerous tools provided by statistical sciences, including some explained in the book, only add to the degrees of freedom a human engineer has in conducting analysis. A wrong causal start, in human worlds, could lead to centuries of agony given the circular loops that exist.

The machines, in their dispassionate ways, could do better by simultaneously evaluating multiple paths at least in numerous cases if not all. Plus human intuition and intelligence are relatively invariant while machines are improving exponentially.

Historically, humans developed intuitive rationale and date came much later in our development only in the last few decades. Supremacy of intuition that has been valid for millennia but this is no longer the case. Living in a world where we do not fully fathom - in our language - machines created by us is scary but this is exactly what is going on forever: us trying to figure out the machine called the world. As man-made machines gallop further, we will need to look at our races’ self-interests but this will need to start with us knowing our limitations.
Profile Image for Andy.
1,373 reviews464 followers
February 22, 2023
This review starts with the audio version and moves on to the printed book.

This topic is very interesting but audio is a terrible format for this book. The narrator is reading out equations. The whole point of the book is to use diagrams. There is a PDF with the audiobook, but the figures are not meaningful on their own. I have ordered the print version. If that makes sense, I will bump up the rating for that.

There were important things I just didn’t understand clearly. For example, he seems to be bashing Sir Austin Bradford Hill, saying that the way epidemiologists figured out that smoking causes lung cancer is somehow pathetic compared with his approach. The Hill approach saved millions of lives. What was wrong with it? What is the equivalent accomplishment of his little arrows? He says Hill wasn’t quantitative, but that’s not really true because the Hill Criteria include magnitude of effect, etc. And of course, the whole smoking/cancer thing is long before Pearl’s work but is an example of what he labels his “revolution” , i.e. that observational data can give us evidence of causality. I have a feeling the emperor is naked here, but I will check on it more before lowering the score, and I won’t need a diagram to figure it out.

--Update February 2019:
I went over the smoking chapter in detail. The emperor is naked.

The idea that one can establish causality without randomized trials is extremely important, but as Pearl himself points out, it predates the causal diagrams. This book does not make a clear case for the causal diagrams being the amazing conceptual revolution that the author claims they are. Perhaps he should diagram that causal pathway.

Pearl leaves out some important details of the tobacco story (like the Readers' Digest article that alerted the American public to the risk of cigarettes) but even just going by what Pearl includes in his version, there was actually a very rapid penetration of the idea that cigarettes caused the lung cancer pandemic. Doll and Hill STARTED their study in 1948. In 1953, EVEN THE TOBACCO COMPANY SCIENTISTS accepted as fact that cigarettes cause lung cancer.

The problem with wider dissemination was that there was a massive campaign of disinformation by the tobacco industry and their minions. The Hill Criteria helped to seal the deal at a broader level despite the propaganda. There is no evidence offered that things would have moved along faster if people had been using causal diagrams. It is very easy in retrospect to say now that of course we could have convinced people about the dangers of smoking if only ... except that as Pearl illustrates, some people were never convinced no matter what.

If causal diagrams have stopped a pandemic and saved millions of lives, then tell us that story.

For a better and beautiful book on epidemiology for a general audience, I would recommend:
Investigating Disease Patterns: The Science of Epidemiology
Investigating Disease Patterns The Science of Epidemiology by Paul D. Stolley
Profile Image for Thiago Marzagão.
187 reviews23 followers
June 6, 2018
This is an engaging, well articulated discussion of causal inference - what it is, what the available tools are (RCTs, IVs, matching, etc), how they have changed over the years, and how they could be improved. The bits that tell the history of causal inference are especially illuminating; I learned a lot of stats in grad school but very little about the struggles and accidents that produced the tools I learned. Pearl helps put much of that into context.

Now, Pearl's intended audience is clearly the machine learning community. Much of what he says will not sound particularly Earth-shattering to people in (or from) the social sciences. "You can't learn causality from data alone, you need a model!" is one of the book's core messages. It's hard to see an economist or political scientist disagreeing with it. You come up with a theory, you think up its observable implications, you test them. Even Pearl's proposal that we use mediation analysis won't sound exactly novel. Social scientists have been doing that, they just don't use that name for it (they call it "testing the theory's microfoundations"). Now, having abandoned political science and lived among the machine learning people for four years now, I can see how Pearl's message is important to his intended audience. And social scientists should read the book too because it intelligently discusses the limitations of tools like RCTs and matching.

In the end what Pearl proposes - that we use our knowledge of how the world works in order to formulate and test hypotheses - may turn out to be (deservedly) influential in the machine learning community, but it won't help fix the core problem with the social sciences, i.e., that social scientists can always twist their hypotheses - not to mention the very questions they ask - to accomodate their pet world views. And when the Democrat/Republican ratio is 6:1, as it is in political science, we can't trust that people will keep each other honest - they won't. Pearl discusses in passing the possibility that some day we may have machine learning algorithms capable of producing their own causal models. Maybe then the social sciences will be worth the money they cost taxpayers.
Profile Image for Marcel Santos.
81 reviews8 followers
September 14, 2021
I read around 75% of this one. I came across this book after starting to listen to some professionals of evidence-based medicine, which is a fascinating field using advanced scientific methods. Concepts such as Bayesian analysis, among others, represent a challenge to someone like me who deals with areas of knowledge (mostly Law and Economics) which are still far from using them. The prospect of understanding more deeply the constantly repeated phrase “correlation is not causation”, which has extensive application in many different areas (including Law and Economics), made me take courage and venture into it. Another motivation was my experience in noticing that scientific methods, language and concepts born in one field have been increasingly borrowed by others, signaling possible ambitious unifications.

However, I must acknowledge that this is too deep a trip into advanced notions of statistics and mathematics. The authors try their best to make the covered issues readable even to some greater public, but they weren’t successful in my opinion. The book requires the reader to have solid background in those fields, and even ones holding such knowledge would need to thoroughly concentrate and study what’s being read (which means reading some passages over and over). I have no doubt that this is a master piece on an absolutely relevant issue - it seems even like a solid framework for technology evolution. Unfortunately it is out of reach to those not into Exact Sciences. It would be unfair if I rated it.

Never mind. I may come back to it in the future if my studies drive me again to the issue.
Profile Image for Dan.
261 reviews60 followers
April 1, 2021
Pearl explains why the theory and practice of statistics developed almost exclusively around correlation. Moreover, when big data and AI fields started to employ more and more statistics, the need for causal explanations turned this limitation into a big problem. For example, in order to show causality, statistics needs to add/employ concepts and practices like: randomization, intervention, control groups, prior specifications and limitations, hypotheses, prospective studies, adjustments, and so on. Pearl claims that his new theory of causality may bypass all of these limitations and difficulties; if a causal diagram can be provided along with the basic observational/correlation data.
In the end, it can be argued that causality is a metaphysical concept that cannot be discovered in the data; but only functions as a human category that directs the collection, organization, explanation, and so on of the data. It feels to me that Pearl is trying to go beyond this in order to provide a scientific foundation for causality as if it really exists in the world. His proposal in the last chapter to add free will, and thus a potential ability to make/assess counterfactual statements and to understand causality, to the current AI approaches seems not at all fundamental, practical, or significant to me.
In order to defend the introduction of causal diagrams and to defend their scientific status, Pearl stated in this book that: “Logic void of representation is metaphysics”. This struck me as a strange statement; since logic understood as representation is metaphysics.
Profile Image for foteini_dl.
430 reviews119 followers
March 3, 2021
Το The Book of Why μάς ταξιδεύει στον κόσμο της αιτιότητας. Μέσα από παραδείγματα από τις κοινωνικές επιστήμες, την ιατρική/φαρμακευτική, την επιστήμη των δεδομένων, μάς δείχνει πώς οι αιτιακές διαδικασίες μπορούν να μας βοηθήσουν να καταλάβουμε *καλύτερα* πώς λειτουργεί ο κόσμος (και το ανθρώπινο μυαλό) και, έτσι, να δώσουμε λύση σε πολλά προβλήματα ή να χαράξουμε πολιτικές. Άσε που μπορεί να λειτουργήσει και σαν χάρτης για τις μελλοντικές εξελίξεις στον τομέα της ΤΝ και της ρομποτικής.

Το βιβλίο μπορεί να έχει *και* κάποιες μαθηματικές εξισώσεις που κάνουν κάποια κομμάτια λίγο δυσνόητα, αλλά γενικά καταφέρνει να είναι κατανοητό στο ευρύ κοινό. Οπότε δεν θα πω "διάβασέ το", αλλά θα σου κάνω μια ερώτηση. Ε, και γιατί να μην το διαβάσεις;
Profile Image for Nelson Zagalo.
Author 9 books320 followers
January 25, 2020
Ao fim de vários dias a pensar sobre este livro, decidi não lhe dedicar mais tempo, menos ainda realizar uma recensão do mesmo. O livro sofre de um problema clássico entre académicos: a incapacidade de escrever para o público geral.

Ainda que o problema de fundo seja complexo, e de difícil verbalização, nomeadamente para quem o aborda pelo lado da matemática, em particular da probabilística, mais valia ter-se ficado por um livro académico, já que não é possível ser lido sem realizarmos pesquisa sobre o mesmo, ou pior, sem reais bases no domínio. Basta pensar na excelência do livro do Pedro Domingos, "The Master Algorithm" (2017), para compreender a diferença.

Assumo que me aproximei do livro pelo interesse que tenho na causalidade a partir da narrativa, já que é a causalidade o garante dos mundos-história. Mas Judea Pearl é um engenheiro, especializado em circuitos e depois em IA. Tornou-se num dos principais investigadores da causalidade, mas no domínio das redes bayesianas, e por isso interessará com certeza a quem esteja familiarizado com a área, e trabalhe no domínio. Contudo serve de pouco a não-especialistas.

Não faz sentido quantificar as estrelas, é um livro académico de uma área que não domino, por isso a nota que lhe iria dar refletiria mais o meu pouco conhecimento da área, do que o livro em si.
Profile Image for Kelly Jade.
8 reviews
December 4, 2018
The book would have been 100 pages shorter if the author spent less time name dropping and talking himself up.

We get it.

Everyone who opposes you is wrong and stupid and you're the greatest and smartest, just look at all your students with all these high level faculty positions.

Interesting ideas but a lot of ideas could have been explained more clearly or completely if the author laid off the commentary and ego stroking.
Profile Image for Minh Nhật.
86 reviews49 followers
March 18, 2019
khen cuốn này thì thừa, một cuốn pop-sci viết bởi 1 học giả lớn trong lý thuyết xác suất và học máy. Có cảm hứng sẽ viết review hoành tráng

cơ mà có ai quan tâm ko nhỉ T^T, ko thì thôi khỏi zậyyyyyyy
Profile Image for Daniel Christensen.
136 reviews16 followers
February 13, 2019
If you are a science or stats geek, or frustrated with the replication crisis in across various disciplines, or even a philosophy/ cognitive science boffin, this book is highly recommended.

Judea Pearl is a heavy heavy hitter. He was a big deal in Computing and Artificial Intelligence (at the forefront of Bayesian networks, which are central to mobile phone signal technology), before he made the leap to questions of causal inference.

The knock on Pearl has been his writing – it’s so hard to get through, I suspect his work didn’t get serious leverage until better communicators (Greenland, Hernan, Robins etc.) came to the party. I attempted his monolithic ‘Causality: Models, Reasoning and Inference’ but was defeated by it.

This book is the fix. Someone needs to buy Dana Mackenzie a shiny new car, or at least a carton of beer, for his work in helping Pearl get his ideas across in an accessible fashion.

Some of the main complaints from other reviewers are that either: 1) it’s too technical; or 2) it’s not how-to enough. So, yes IT IS A BIT TECHNICAL. And yes, THIS IS NOT A HOW-TO.

1) IT IS A BIT TECHNICAL. I think whether or not it is too technical depends on where you are at – I take this stuff seriously, and I expected a bit of pain, so I think he simplified just enough (I describe my background below, if it helps you orient yourself to the review). The book is not at all mathematical, although it is brutally logical in spots.

2) THIS IS NOT A HOW-TO. It’s a big picture overview (I suggest a few how-to’s below).

The third major criticism of Pearl is about where he sits into other approaches to causality, particularly those in economics (and to a lesser extent machine learning) – I don’t know enough about this yet but I’ll add an appendix to my review as I build sufficient background.

The book is about what Pearl calls the Causal Revolution. It’s about scientists (especially in social science and epidemiology) taking the question of when you can (and cannot) infer causality seriously. The book gives an excellent review of the evolution of ideas in statistical and science about causality, and lays down a serious challenge to the mantras ‘no causality in observational studies’ and ‘correlation does not imply causality’. At the very least, Pearl helps make explicit when these mantras make sense — Pearl makes extensive use of the debate over smoking and lung cancer to illustrate his point. As Hernan and Robins point out in their book, how many of us need evidence from an RCT to confidently deduce that putting your hand on a hot stove causes burning?

(As an analyst/ research fellow working in social science I’m often struck by how we make a statement in our limitations along the lines of ‘this study is observational, we cannot infer causality’ and then make an implicitly causal recommendation like ‘support mothers with mental health issues’, ‘don’t smoke’, ‘eat less fatty food’ or ‘school attendance is good for your grades’ (incidentally, these are all likely sound recommendations and it’s really our tradition of denying causality as a matter of course that’s the issue.))

Pearl is the originator of the Directed Acyclic Graph (the DAG), that is causal graphs, and a formal logic of causality. He is a relentless evangelist for these ideas. He has converted me to his religion (Judea-ism?) but it’s important to recognise that he offers a particular perspective on the issue of causal inference. There are other views out there (particularly in economics) that differ on some issues with Pearl (if/ when I come up with a good summary of their issues with Pearl I will add it as an appendix to the review).

Pearl does a good potted history of statistics, science and causal inference, with a lot of love for Sewell-Wright and his guinea pigs. He devotes a chapter to an overview of the Bayes rule and it’s applications. Including the Monty Hall problem, which unfortunately still confuses me (I don’t blame Pearl for this).

The book itself makes extensive use of causal diagrams to help build the reader’s intuition. This covers off a more systematic approach to selecting which covariates an analyst should (and shouldn’t) adjust for, and the language of common causes and common effects. He also gives an accessible review of Simpson’s and Berkson’s paradoxes.

Using causal diagrams offers an accessible tool for communicating instrumental variable and Mendelian randomisation analyses.

Pearl thinks about causal inference in mind-bogglingly abstract terms. The weakness is that (until now), it’s been left to others to help communicate his ideas. The strength is the sheer power and imaginativeness of his approach. Pearl offers up several extensions of his work that I was less aware of from the work of others. In particular, he is a strong advocate of the front-door adjustment method (basically the piece-wise synthesis of causal models from separate studies).

Another ‘innovation’ from the book is Pearl’s way of thinking about the problem of ‘transportability’ (I’d always called it generalisation) – how do we apply results from one context, population or setting to another? Again, Pearl uses the causal diagram to communicate his ideas.

As behoves his background in AI and Cognitive Science, the book is also rich with speculation about intelligence and consciousness (human and artificial). I found all this entertaining and thought-provoking. He contrasts his approach to the Big Data approach, but also proposes a marriage between the approaches.

I’ll give the caveat – I didn’t come to this book cold. I’ve worked in research for about 8 years with a bachelor’s degree in psychology and a Masters in Applied Stats. Over the last year I’ve been to several short courses on this issue, and I’ve done a lot of reading on the topic. Despite that, I’d recommend this as a good place to start (possibly in conjunction with working through all the examples and quizzes on DAGitty.net, and Miguel Hernan’s excellent HarvardX course).

It is not a “how to” for applying causal models to a specific analysis. Depending on the reader’s specific needs and interests, there are a few good resources out there but I quite liked Bill Shipley’s Cause and Correlation in Biology. The DAGitty web package is also excellent. Tyler VanderWeele’s book. OR, cross over to the dark side and look into econometrics.


Critiques of Pearl (work-in-progress)

The criticism of Pearl — He’s partisan. In presenting his history of causality, he emphasises his own contribution, and de-emphasises the contribution of others.

This is true. This doesn’t diminish his approach or the utility of his methods, but if you want balance, you’ll need to shop around for other points of view.

A few other reviews note that causality in social sciences is maybe not the new thing that Pearl argues it is. It’s an interesting area – economists have been on the causality bandwagon for years – but the other social and behavioural sciences are replete with examples of poor choice of control variables reflecting a lack of causal reasoning (and the explicit denial of it).

Pearl reviews examples where we would accept causality has been proven without his formal causal logic (e.g. smoking and lung cancer, John Snow’s cholera studies). What he communicates well is that, without a clear causal logic, it quickly got messy.

From where I’m sitting the causal diagrams offer a tool of communicating with other branches of the social sciences, and also offer a useful means for interrogating the assumptions behind economic causal models.

Don Rubin (the king of missing data and potential outcomes), disagrees on the value of causal diagrams. I’ve only seen the argument well articulated from the Pearl camp, and I’m not sure if this is purely a matter of ‘who owns causality’ or if there are worthwhile lessons in here for practicing scientists/ analysts (to be continued...).

There is also a back and forth series of letters in the International Journal of Epidemiology between Pearl on the one hand and Nancy Krieger/ George Davey-Smith on the other hand. The latter camp are arguing for a more pluralistic approach, and point to some instances where using DAGs and DAGitty has produced some implausible models. This intrigues me. I’ll add some notes on this later.


Feb 13 2019 - I wrote a blog post on the topic more generally: https://medium.com/@daniel.christense...
Profile Image for Annie.
923 reviews311 followers
January 17, 2023
Dull as dishwater, honestly. Whatever interesting content it might contain, I absorbed none of it, because my eyes were going numb reading this. It induced a dissociative fugue state.

A perfectly representative example of the text:

"[Applying this principle to the billiard problem], in order to find the probability of L, given X, we need a quantity that is not available to us from the physics of billiard balls. We need the prior probability of the length L, which is every bit as tough to estimate as our desired, the probability of L, given x. Moreover, this probability will vary significantly from person to person, depending on the given individual's experience with tables of different lengths."

Do you care? Not at all. Neither did I.
October 27, 2022
Finally done after almost a year! The time span of this read was not due to lack of very interesting points, nor the ability to engage with the reader - the language is informal and intuitive, at least when it does not include too many equations. Rather, it was due to the fact that it deals with a conceptually heavy topic, which is not always soo tempting before going to sleep. Nevertheless, it was a pleasure. Pearl has been at the forefront of the research of causal methods for decades, being the architect of several revolutionary innovations such as Bayesian networks. His ability to explain complicated concepts in an intuitive way, his description of the historical evolution of the field and the richness of research he refers to are all brilliant. I shall not pretend to remember any details of do-calculus. That would have required a whole corse on the matter. But the division of causal levels into associations (correlations), intervention (experiments) and - most importantly- imagination (counterfactuals) is such a powerful yet simple idea and will certainly be something I remember. Recommended for anyone who is likes to ask the question “Why?” and want to know how we might answer it.
59 reviews53 followers
July 17, 2018
This book aims to turn the ideas from Pearl's seminal Causality into something that's readable by a fairly wide audience.

It is somewhat successful. Most of the book is pretty readable, but parts of it still read like they were written for mathematicians.

History of science
A fair amount of the book covers the era (most of the 20th century) when statisticians and scientists mostly rejected causality as an appropriate subject for science. They mostly observed correlations, and carefully repeated the mantra "correlation does not imply causation".

Scientists kept wanting to at least hint at causal implications of their research, but statisticians rejected most attempts to make rigorous claims about causes.

The one exception was for randomized controlled trials (RCTs). Statisticians figured out early on that a good RCT can demonstrate that correlation does imply causation. So RCTs became increasingly important over much of the 20th century[1].

That created a weird tension, where the use of RCTs made it clear that scientists valued the concept of causality, but in most other contexts they tried to talk as if causality wasn't real. Not quite as definitely unreal as phlogiston. A bit closer to how behaviorists often tabooed the ideas that we had internal experiences and consciousness, or how linguists once banned debates on the origin of language, namely, that it was dangerous to think science could touch those topics. Or maybe a bit like heaven and hell - concepts which, even if they are useful, seem to be forever beyond the reach of science?

But scientists kept wanting to influence the world, rather than just predict it. So they often got impatient, when they couldn't afford to wait for RCTs, to act as if correlations told them something about causation.

The most conspicuous example is smoking. Scientists saw many hints that smoking caused cancer, but without an RCT[2], their standards and vocabulary made it hard to say more than that smoking is associated with cancer.

This eventually prompted experts to articulate criteria that seemed somewhat useful at establishing causality. But even in ideal circumstances, those criteria weren't convincing enough to produce a consensus. Authoritative claims about smoking and cancer were delayed for years by scientists' discomfort with talking about causality[3].

It took Pearl to describe how to formulate a unambiguous set of causal claims, and then say rigorous things about whether the evidence confirms or discredits the claims.

What went wrong?

The book presents some good hints about why the concept of causality was tabooed from science for much of the 20th century.

It focuses on the role of R.A. Fisher (also known as one of the main advocates of frequentism). Fisher was a zealot whose prestige was somewhat heavily based on his skill at quantifying uncertainty. In contrast, he didn't manage to quantify causality, or even figure out how to talk clearly about it. Pearl hints that this biased him against causal reasoning.

path analysis requires scientific thinking, as does every exercise in causal inference. Statistics, as frequently practiced, discourages it, and encouraged "canned" procedures instead.

But blaming a few influential people seems to merely describe the tip of the iceberg. Why did scientists as a group follow Fisher's lead?

I suggest that the iceberg is better explained by what James C. Scott describes as high modernism and the desire for legibility.

I see a similar same pattern in the 20th century dominance of frequentism in most fields of science and the rejection of Bayesian approaches. Anything that required priors (whose source often couldn't be rigorously measured) was at odds with the goal of legibility.

The rise and fall of the taboo on causal inference coincide moderately well with the rise and fall of Soviet-style central planning, planned cities, and Taylorist factory management.

I also see some overlap with behaviorism, with its attempt to deny the importance of variables that were hard to measure, and its utopian hopes for how much its techniques could accomplish.

These patterns all seem to all be rooted in overconfident extrapolations of simple models of what caused progress. I don't think it's an accident that they all peaked near the middle of the 20th century, and were mostly discredited by the end of the century.

I remember that when I was young, I supported the standard inferences from the "correlation does not imply causation" mantra, and was briefly (and less clearly) tempted by the other manifestations of high modernism. Alas, I don't remember my reasons for doing so well enough to be of much use, other than a semi-appropriate respect for the authorities who were promoting those ideas.

An example of why causal reasoning matters

Here's an example that the book provides, dealing with non-randomized studies of a fictitious drug (to illustrate Simpson's Paradox, but also to show the difference between statistics and causal inference). The studies quantify three variables in each study:

* Study 1: drug <- gender -> heart attacks
* Study 2: drug -> blood pressure -> heart attacks

The book asks how we know we should treat the middle variables in those studies differently. The examples come with identical numbers, so that a statistics program which only sees correlations, and can't understand the causal arrows I've drawn here, would analyze both studies using the same methods. The numbers in these studies are chosen so that the aggregate data suggest an opposite conclusion about the drug from what we see if we stratify by gender or blood pressure. Standard statistics won't tell us which way of looking at data is more informative. But if we apply a little extra knowledge, it becomes clear that gender was a confounding variable that should be controlled for (it influenced who decided to take the drug), whereas blood pressure was a mediator that tells us how the drug works, and shouldn't be controlled for.

People typically don't find it hard to distinguish between the hypothesis that a drug caused a change in blood pressure and the hypothesis that a drug changed patients' reported gender. We all have a sufficiently sophisticated model of the world to assume the drug isn't changing patients' gender identity (i.e. we know that if that assumption were unexpectedly false, we'd hear about it).

Yet canned programs today are not designed to handle that, and it will be hard to fix programs so that they have the common sense needed to make those distinctions over a wide variety of domains.

Continuing Problems?

Pearl complains about scientists controlling for too many variables. The example described above helps explain why controlling for variables is often harmful, when it's not informed by a decent causal model. I have been mildly suspicious of the controlling for more variables is better attitude in the past, but this book clarified the problems well enough that I should be able to distinguish sensible from foolish attempts at controlling for variables.

Controlling for confounders seems like an area where science still has a long way to go before it can live up to Pearl's ideals.

There's also some lingering high modernism affecting the status of RCTs relative to other ways of inferring causality.

A sufficiently well-run RCT can at least create the appearance that everything important has been quantified. Sampling errors can be reliably quantified. Then the experimenter can sweep any systemic bias under the rug, and declare that the hypothesis formation step lies outside of science, or maybe deny that hypotheses matter (maybe they're just looking through all the evidence to see what pops out).

It looks to me like the peer review process still focuses too heavily on the easy-to-quantify and easy-to-verify steps in the scientific process (i.e. p-values). When RCTs aren't done, researchers too often focus on risk factors and associations, to equivocate about whether the research enlightens us about causality.

The book points out that an AI will need to reason causally in order to reach human-level intelligence. It seems like that ought to be uncontroversial. I'm unsure whether it actually is uncontroversial.

But Pearl goes further, saying that the lack of causal reasoning in AIs has been "perhaps the biggest roadblock" to human-level intelligence.

I find that somewhat implausible. My intuition is that general-purpose causal inference won't be valuable in AIs until those AIs have world-models which are at least as sophisticated as crows[4], and that when that level is reached, we'll get rapid progress at incorporating causal inference into AI.

It's true that AI research often focuses on data mining (blind empiricism / model-free approaches), at the expense of approaches that could include causal inference. High modernist attitudes may well have hurt AI research in the past, and that may still be slowing AI research a bit. But Pearl exaggerates these effects.

To the extent that Pearl identifies tasks that AI can't yet tackle (e.g. "What kinds of solar systems are likely to harbor Earth-like planets?"), they need not just causal reasoning, but also the ability to integrate knowledge from a wide variety of data sources - and that means learning a much wider variety of concepts in a single system than AI researchers currently have the power to handle.

I expect that mainstream machine learning is mostly on track to handle that variety of concepts any decade now. I expect that until then, AI will only be able to do causal reasoning on toy problems, regardless of how well it understands causality.

Pearl is great at evaluating what constitutes clear thinking about causality. He's somewhat good at teaching us how to think clearly about novel causal problems, and rather unremarkable when he ventures outside the realm of causal inference.


[1] - RCTs (and Fisher's influences in general) don't seem to be popular in physics or geology. I'm curious why Pearl doesn't find this worth noting. I've mentioned before that people seem to care about p-values being less than 0.05 mainly where powerful interest groups might benefit from false conclusions.

[2] - The book claims that an RCT for smoking "would be neither feasible nor ethical". Clarke's first law applies here: it looks like about 8 studies had some sort of randomized interventions which altered smoking rates, including two studies focused solely on smoking interventions, which generated important reductions in smoking in the control group.

The RCTs seem to confirm that smoking causes health problems such as lung cancer and cardiovascular disease, but suggest that smoking shortens lifespan by a good deal less than the correlations would indicate.

[3] - As footnote 2 suggests, there have been some legitimate puzzles about the effects of smoking. Those sources of uncertainty have been obscured by the people who signal support for the "smoking is evil" view, and by smokers and tobacco companies who cling to delusions.

Smokers probably have some unhealthy habits and/or genes that contribute to cancer via causal pathways other than smoking.

The book notes that there is a "smoking gene" (rs16969968, aka Mr Big), but mostly it just means that smoking causes more harm for people with that gene.

Yet the book mostly implies that the anti-smoking crusaders were at least 90% right about the effects of smoking, when I think the reality is more complicated.

Pearl thinks quite rigorously when he's focused exclusively on causal inference, but outside that domain of expertise, he comes across as no more careful than an average scientist.

[4] - Pearl would have us believe that causal reasoning is mostly a recent human invention (in the last 50,000 years). I find Wikipedia's description of non-human causal reasoning to be more credible.
Profile Image for Juan.
Author 22 books29 followers
August 29, 2020
This was a borrowed book, the kind of books for which I have the utmost respect. Meaning, no reading it in the beach or anywhere close wet things. Which is why it took longer than expected, although finally I had to disrespect it just a little tiny bit since I had a deadline to return it.
Do borrowed books take less to be read than any other kind of books? We might find a correlation between these two variables. Correlation is not causality, however (and I'm not entirely sure there's such correlation, either). And until a relatively short time ago, there were no good tools that one of the shortening the acquisition-to-finished reading time was the fact that it was borrowed (from someone else, from a library).
In this book, Judea Pearl (and coauthor) talk about how statistics (and mathematics in general) evolved from that only-correlation phase, to a phase in which it's possible, through graphical tools, to examine causality in a principled way, and also how to challenge the assumption of correlation equal to causation.
This book is quite revealing; meaning that when you find news such as this one https://www.sciencenews.org/article/b..., you inmediately go past the headline and try to find out what are comon causes of the correlated facts. Might find them or not, but if found there will be some tools that will tell you how different factors influence outcomes.
Along the line, as is usual in divulgative books, anecdotes of the curious characters that populated mathematics during the last century are found out; we also discover the intra-history of how tobacco was actually discovered to cause cancer, finding along the way that causation is no laughing matter, and might lead literally to life-or-dead decisions. All in all, an interesting read, even if you are not interested in the history of science, but just want to have a few tools to analyze current news.
Profile Image for Jakub.
65 reviews
February 15, 2021
a very insightful and revolutionising view into the causal revolution. it's clear that using the author's remarks, one can finally start finding cause in the world, instead of being lost in meaningless correlations. the book is for someone deeply interested in the future of AI, in the future of causal research in medicine, psychology, and similar science fields. a casual reader will have difficulties following the more complicated concepts as the book often resembles a textbook rather than a pop science penguin writing. although a bold statement, if some people take Thinking, Fast and Slow as the revolutionary book of behaviour, biases, etc., then this book lives up to a similar extent in terms of rediscovering causality and correlations.
Profile Image for Jason Furman.
1,171 reviews770 followers
August 16, 2018
This was a long, strange trip through the statistical analysis of causation. Judea Pearl writes beautifully and in an almost grandiose manner, dubbing himself a Whig historian of the science of causation--how it was forgotten by statistical analysis that put correlation at the pinnacle of analysis, how it was rediscovered later, and in particular the importance of structural models that combine an understanding of the world with the data--but do not just let the data speak for itself. The book combines a history of science with a number of specific examples (e.g., do cigarettes cause cancer or does an algebra program in Chicago increase math knowledge) along with some of the mathematics of his method. But mostly Pearl's method centers around writing causal diagrams with arrows that allow you to identify blockers, cofounders, and the like. The arrows and terminology was not familiar to me from econometrics but many of the conclusions and techniques were (e.g., RCTs, multiple regression and instrumental variables).

In some cases Pearl claimed a greater profundity than I was able to follow, for example I could understand the Bayesian interpretation of his argument but he claimed there was a bigger one. In other cases he claimed that his diagrams opened up entirely new paths to solving causal questions and understanding the results of statistical analysis. In all of these cases I confess that I mapped them into my previous understanding rather than expanding, changing, evolving my previous understanding--and am unsure if this represents my limited understanding of his book or his overclaiming about his ideas, many of which were well understood and implemented in econometrics before.

The last chapter on AI, free will, explicability, and correlation vs. causation in big data was quite interesting but a bit of a departure from the rest of the book.

Overall, would recommend this to economists or others who are very interested in statistical analysis, it takes some effort at times (nothing like a textbook, which would be the best way to assess the novelty of some of the ideas), but amply rewards it.
Profile Image for Carl Zimmer.
Author 71 books1,488 followers
July 5, 2018
Cause and effect may seem like the stuff of pure philosophy, but Judea Pearl shows how important causation is to the applications of science, from the technology in our cell phones to the link from smoking to cancer. Pearl, a UCLA computer scientist, presents a personal history of this field using lots of light-hearted thought experiments to illustrate his points. It requires serious concentration, but that concentration is amply rewarded.
48 reviews4 followers
August 10, 2018
Every now and then you read a book that introduces you to a new concept and forces you to reevaluate your world view, leaving you better for it. For me, this was one such book. Highly, highly recommend.
Profile Image for Karel Baloun.
398 reviews34 followers
November 6, 2018
Valuable for your permanent for ongoing reference and inspirational revisiting, with an absolutely ideal annotated bibliography. Artisan crafting to certainly withstand the test of time.

Invest 2-3 days in simplifying and repairing how you think causally! I’m sure glad I did. Fun and readable, and so practically valuable. The brilliant core tenet: data are dumb, people’s models can be smart. Knowledge is in the model, not just waiting to emerge from the data. Wow.. that contradicts completely what modern world correlation economists, policy analysts and social scientists have taught for several decades!

On pg 64-65, Pearl elegantly distinguishes two models of how talent and luck generate success. If luck applies independently to each degeneration, the model is mathematically stable, but it it accrues (with talent) over generations you get a wide persistent distribution of outcomes. For me, this profoundly simplified economic inequality, and shows how useful is a rigorous framework for thinking. Without accruing “luck” you get reversion to the mean, and with it you get dynastic wealth. Yet with either one, talent is passed down generations, aiding success under equal opportunity.

Appropriately, in the engaging historical chapter 2, Pearl often asks Why historical persons thought as they did, whenever he is able to answer himself. These Why’s show how important causation is to understanding anything of importance. It helps the storytelling that the eminent statisticians Karl Pearson & Fisher are (through arrogance and dominance) the scientific bad guys. And it is consistent to see Fisher return in his evil cantankerous role during the tobacco trials and as a professor of Eugenics.

Pages 104-7 provide the most simple and lucid explanation of estimating the likelihood of having a disease from a positive test result for it. So useful.

Fun seeing how fuzzy math and Bayesian networks, cutting edge research when I was in grad school, have evolved into the mainstream. I can almost imagine my alternative life, had I studied this and developed ways to use it. 30 years ago honestly, I couldn’t have imagined it would be as widespread as it is today. Partly because back then we had no big data.

The paradox chapter is fascinating, and especially meaningful because it anchors the earlier theoretical ideas into memory.

The closing chapters are difficult, in terms of figuring out how to apply this sparkling and sharp tools to your own life and work. Well, I suppose that should be a challenge. I wish more work and opinions from outside of Pearl’s immediate academic lineage were included, since I don’t feel we are given a clear view of whether there are any, and how his work fits into the overall future of science.

Final chapter on Ai is only a start, and leaves little on which to build. Author’s assumption that a moral Ai could resolve all control problems feel shallow to me, however attractive! His assertion, that free will is superior in performance to borg like behavior from simulations like generative adversarial networks, feels unproven.
Profile Image for Alex Lee.
894 reviews108 followers
January 25, 2020
This is amazing. Essentially Pearl and Mackenzie provide a manner to assess causation through data alone.

The key is to provide a model for causation to test the data against. Much of the stats goes over my head, but intuitively we understand how to test for causation; how to get at what matters, what doesn't, what kind of matters and under what conditions we should experiment.

But then again, we don't. Often we control for too much, indirectly influencing our experiments. What we have here is a frontier of twisting around our thinking of causation. Often we also think that causation should be expressed in terms of direct causes, but this is too simple. Counterfactuals, as Pearl/Mackenzie provide, is the manner we need to approach causation. Only in this make-believe-sense can we get at what really matters, because reality can have multiple causes all which have different weights and forms of interference.
Profile Image for Kuba Jeziorski.
49 reviews
February 26, 2023
Nigdy nie zwróciłem uwagi, że określenie przyczynowości może być tak problematyczne. Pojawiło się tu kilka ciekawych zagadnień, szczególnie wyraźne oddzielenie przyczymy od korelacji oraz nieznane mi wcześniej paradoksy.

Sama książka jednak, moim zdaniem, jest mało interesująca. Nakreślony w pierwszych stronach problem (drabina przyczynowości) mocno mnie zachęcił, ale po 100 stronach (z 460) zapał osłabł i do końca już wiele się nie zmieniło.
Displaying 1 - 30 of 594 reviews

Can't find what you're looking for?

Get help and learn more about the design.