When you know the cause of an event, you can affect its outcome. This accessible introduction to causal inference shows you how to determine causality and estimate effects using statistics and machine learning.
In Causal Inference for Data Science you will learn how to:
Model reality using causal graphs Estimate causal effects using statistical and machine learning techniques Determine when to use A/B tests, causal inference, and machine learning Explain and assess objectives, assumptions, risks, and limitations Determine if you have enough variables for your analysis
This is a terrible book. It has a few strengths, but far more weaknesses. I strongly recommend avoiding it and instead turning to other sources for studying causal inference. The author promotes analytical methods that are almost guaranteed to produce nonsense. He also failed to mention the DoWhy package, which has become the standard tool for causal inference in Python.
The main challenge in causal inference is that analysts often fail to include important confounders in their analyses—or simply can’t, because the necessary data isn’t available. The author briefly mentions the problem of missing or unobserved confounders but doesn’t explore it in depth, which is unfortunate.
The first half of the book focuses on how doctors choose treatment methods for patients with kidney stones. Only the size of the stones is used as a confounder. In reality, the key confounders in such cases are financial incentives behind doctors’ decisions. Ignoring these incentives inevitably leads to false conclusions.
The author devotes an entire chapter to propensity score models, which in practice almost always produce false conclusions. It’s worth exploring sensitivity analysis and applying it to the results you get from these models. Doing so will quickly show you that it’s best to discard propensity score methods altogether.
The last two chapters, on potential outcomes and time-related events, struck me as both overly complicated and practically useless.
The book’s strengths include clear explanations of the basic building blocks of causal inference and the T-learner method. However, these positives do not make up for its major flaws. If you are able to run Python scripts, you’re better off reading the DoWhy package documentation instead of this book.