My son George’s first language is Japanese.
His first annoying habit, which raised its head very soon after he was granted the gift of speech, was to answer every request / question / casual comment with “doshte?”
“Doshte,” you guessed it, is Japanese for “why?”
This, Judea Pearl argues very persuasively in this book, is –for the time being-- the biggest difference between thinking men and thinking machines.
I LOVED this book. Loved it, loved it, loved it.
You can read “The Book of Why?” as a popular science book. So I started reading it that way and, some thirty pages in, I thought to myself “very weird, I have not lost this guy yet!” You could not really say that about “Brief History of Time,” could you? The funny thing is, I eventually I made it all the way to page 370 (the last page) and I was still with the author! For me, that’s a first: a popular mathematics book that carries on introducing new (and I mean NEW) material, concepts that were not understood when I went to college, but regardless explains it clearly enough that I could carry on learning the whole way through.
To briefly summarize, the author explains that sometime a hundred years ago meaning in statistics was sacrificed at the altar of rigor: because the giants who defined the space were not personally comfortable putting a definition on the word “why,” they not only repurposed the entire field to answer “when” (thereby throwing the baby out with the bathwater), but also rendered it heretical to examine causation. In particular, the practice of identifying probability-altering interventions was proscribed by the mathematical mainstream.
To get the ball rolling, the author stakes his claim early on in the book and defines the “do” operator. (p.48) For example, if we know for fact that getting rid of the weeds (which we call action do(X)) results in a better crop Y, we can go ahead and write:
P(Y/do(X)) > P(Y)
This was deemed to be heresy because there was no clean mathematical meaning for “do” and no set of operations / conclusions that could derive from it. Cauchy was no longer around, I suppose. The orthodoxy was established that we only need care about association. From the full list of associations logical people were free to draw their own conclusions regarding causation. If cutting the weeds and a better crop are correlated, nobody is going to accuse you of making stuff up if you conclude one caused the other.
The benefit of the canonical approach, and it is an enduring benefit we should not disparage, is that, with minimal knowledge of mathematics, you can use a statistical package that slices and dices the “whens” and gives you a slew of pre-packaged answers: “sons of 72-inch-tall fathers will on average be 71 inches tall, but sons of 68.5-inch-tall fathers will on average be 68.5 inches tall.” My George is therefore predicted to be 68.5 inches tall, and I’m the average height of a male British subject in 1877, more disturbingly! Ah, but on the plus side, that probably also means George will be taller than me, because the mean has no doubt shifted up…
Cool, but we can do better. A lot better!
Judea Pearl goes into enormous pains to give maximum credit to all his students / disciples, but it was he who singlehandedly forced mankind up a construct that he calls “the ladder of causation.” Here he invites you along!
First, he takes you one step up from “association” to “intervention.” To do so, you need to start drawing pictures. Graphs (causal diagrams, they’re called) that allow you to point from causes to effects. These charts need not be handed down by a higher being. You can sketch your own, you can test the conclusions versus the data and you can change your mind and draw them again.
These charts, once you’ve drawn them, naturally force you to observe three important types of nodes / factors: “mediators” (example: tar in your lungs mediates between your smoking and you getting lung cancer), “confounders” (example: a now-identified “smoking gene,” rs16969968, both makes people likelier to smoke and makes them more susceptible to lung cancer, but clearly does not deposit tar in their lungs) and “colliders”(example: smoking and birth defects can both affect birth weight). The author goes on to explain what the “front-door path” and the “back-door path” is from potential cause X to potential result Y. (In a later chapter he expands this repertoire to “instrumental variables.”)
Next (some 200+ pages deep into the book) comes the math, which is the first time you’re asked to actually believe the author, rather than find yourself invited to discover alongside him. And here’s what the math says:
suppose you’ve drawn your causal diagrams;
suppose you’ve expressed them in mathematical expressions, using the “do” operator;
then there are exactly three “legitimate transformations” you can apply to these equations that correspond to the diagrams in order to convert them into testable (or otherwise!) run-of-the-mill probabilistic statements of the kind a conventional statistician can abide:
1. If W is irrelevant to Y, then
P(Y / do(X), Z, W) = P(Y / do(X),Z)
2. If a set of variables Z blocks back-door paths from X to Y, then
P(Y / do(X), Z) = P((Y / X, Z)
3. If there are no causal paths from X to Y, then
P(Y / do(X)) = P((Y)
Not only that, but there are no other necessary rules. If there’s a way to convert your causal diagrams into classical probability statements, then there’s a way to do it with these three tricks.
These are, in short, the three rules of “do calculus” and they allow you to test your intuition regarding causation. You can put them to two separate uses:
1. You can now design better experiments
2. You can look at already existing data better, resolving a large number of “paradoxes”
A “worked example” is provided on page 236, that takes you in six simple steps from
P(c / do(s)) = Sum over t [P(c / do(s),t) P(t / do(s))]
to the testable:
P(c / do(s)) = Sum over s’ [Sum over t [P(c / t, s’)P(s’)P(t / s)]]
The author next works his way through a couple of these paradoxes that the new method cuts into shreds: Berkson’s Paradox (smokers in a 1995 thyroid disease study have a higher survival rate than non-smokers), Simpson’s Paradox (most departments at Berkeley favor women in admissions, but women overall have a lower chance of getting into Berkeley than men) all fall under the weight of his new weapon.
(With that said, I still like my explanation more about why you should change the door in the famous TV game: (i) the chance you picked well to start with is 1/3, (ii) if you didn’t pick well you’re guaranteed to win when you change door! The author goes over the blah blah regarding how the game host show imparted one part of the decision tree with extra info…)
Now I’ve established I’m in awe of the author, and while I’m being a smart-alec, I’ll point out the one issue I have a problem with:
Judea Pearl protests too much about his predecessors’ notion that “it’s all in the data.” Yes, his tools help you design better experiments. Away from that fact, however, (and yeah, that’s pretty major and would in itself be enough of a contribution to mankind) this new calculus of causation in the end amounts to a set of new “goggles” we can wear to look at data better. To my taste, then, he complains a bit too much. To a great extent, it IS all in the data, it’s just that thanks to him we now know better where to look.
(note to the reader: this may be the correct place to tell me I’ve understood nothing)
The astounding thing is that this is only the first step we’re invited to climb alongside the author on the “ladder of causation.” And so it is that you climb one more step, from “intervention” to “counterfactuals.”
This is, finally, the “why” step that lends its name to the book.
Example: when an angry coach tells a player he should have passed the ball to a teammate rather than try to dribble the goalie, the player knows why: his teammate would have scored! That is the counterfactual! It is the state of the world that did not come to be, but against which his actions have been judged.
Believe it or not, a second calculus has been invented by the author and his associates in the space of the past couple decades, with the explicit purpose of putting some mathematical meat on the bones of this syllogism.
The main problem solved is the one where a fire and a blocked fire escape combine to cause somebody’s death. The combination of the factors guarantees the outcome, but how bad you feel about the blocked fire escape depends on your estimate of what the chances of death would have been had the fire escape not been blocked.
Needless to say, this is a simplified example and the calculus helps you deal with continuous outcomes, not only binary outcomes.
The author defines three quantities: total effects, Net Direct Effects and Net Indirect Effects.
Let us say an extra year of education leads to higher salary through two paths, one because people pay up for better-educated people and one because the stuff you’ve learnt may help you perform better.
The author defines two quantities:
The Net Direct Effect of a year of education is how much more you will be paid if you skip the studying and go to the Bahamas, a friend sits the test in your lieu, you come back exactly as skilled and motivated as you left, but nobody finds out and as a result your employer pays up just because you got the degree.
The Net Indirect Effect is how much more than your pre-degree self people get paid who never got the degree but somehow have the same skills as you will have post degree.
“The total effect of an extra year of education is equal to the Net Direct Effect of an extra year of education MINUS the Net Indirect Effect of SKIPPING a year of education”
(not plus the Net Direct Effect of having it)
This equation is applied to the problem of the smoking gene and cancer and demolishes the excuses of anybody who has the infamous gene: they’d better quit smoking, bottom line, and the rest is talk!
Which lands you safely on chapter 10, the last chapter of the book, the one regarding artificial intelligence: can we teach a computer morals and should we do so?
It is, comfortably, the best chapter in the book!
Armed with the tools you’ve just mastered, you have no problem following the author’s argument: if we can teach a machine to think like a child and consider the consequences of taking or not taking actions and if we additionally give it license to test (again, like a child) the consequences of its actions, then we can answer our questions with a "yes."
For a machine that is equipped to ask "why" is a machine that we can count on to do the right thing and act as our moral compass.