Name: Causation, Prediction, and Search (Lecture Notes in Statistics)
Rating: 3.88 (4 reviews)
ISBN: 9780387979793

g.lkoa

24 reviews2 followers

September 21, 2018

{Provisional pitch} ~ slow-read, re-reading, to-be-read-again | very important. All three authors are recognized as leading names in theoretical and inferential statistics (in addition to being widely referenced, also, for their contribution to machine learning).

It is a great/valuable approach to the estimation of causal effects, a topic whose core problems consist – in, perhaps, the humblest graphical model – of something from X to Y, that’s to say a probability function mapping like (Y | cause(X = x)).
A bit of manipulation of this dummie system can be trivial in so far as it is reasonable to trace back the class X to some value x, measure Y, conceive some ‘cause()’ operator, and hence obtain a distribution.

If you cannot make up this lab or observational experiment, then you get a joint distribution for some value of covariates Z – meaning P(X, Y, Z, Zn, …)... which by itself, of course, is not even slightly enough to get a P(Y| cause(X=x)). Joint distribution, y’kno, doesn't identify a causal effect.

The treatement on which this work on causality&inference relies is such that a causal model can be identified, whenever possibile, for some varbile X that is said to be a cause of a variable Y if and only if Y depends on X for its value in some mathematically _explicit_ sense. Which in turn can be expanded in many ways (there’s a few efforts at catering to non professional or non-mathematicians), but the definition just exactly enacts that X is a cause of Y if Y decides its value in response to that X. Causation is then said to be transitive, irreflexive, and antisymmetric.
This whole so far, as a broad concept, might be understandable in the end or even obvious, but in fact it’s essential to formally enstablish a realm where there’s no need to resorting to counterfactuals.

(Accordingly, say, you want to know the effect of X on Y and you’ve gotten around to find a set of V as control variables, then if (1.) V collides all paths-vector from X to Y with a pointer into X and (2.) if no node in V is a successor of X, then you’re sure all the items in your model are observational conditional probabilities. Meaning that V satisfies a back-door criterion; no counterfactuals required, indeed.)

*

Past the first two-three chapters on axioms, statistical indistiguishability and Causally Sufficient Structures, the authors attempt to scrutinize - and, more importantly, test - the notion that even correlations, while not implying causation, should nevertheless have some causal explaining power as unambiguous as possible (an idea, to be sure, as old as Simon's papers on computational complexity in decision making, i.e., ‘bounded rationality’), which is equal to adressing the following non-trivial task: define some classes of correlations, also multi-variable correlations, in order to add some restraints on them within the domain where a pattern of said correlations is legal.

This logic pretty much boils down to the twin notions of Markov equivalence and distribution equivalence. For instance, given an X = {X, Y, Z}, this model may generate a structure such that X → Y → Z, (its opposite) X ← Y ← Z, X ← Y → Z, and X ← Y ← Z, from which one gathers that they represents only the statement that X and Z are conditionally independent given some Y. The first three structures, in fact, consists of an Y separating X from the third variable, and said variables are therfore independent of each other, conditional on Y. Drawing on Verma and Pearl, 1990, two model structures for some variable X are known to be Markov equivalent (meaning fairly indiscernible) if they can exhibit the same set of conditional-independence assertions for X (and the absence of conditional independency definitely suggests which way the causal links lie). So those model structures must be taken as equivalent. This very process of ‘orienting’ correlation chains can be mathematically difficoult, tough to digest, but indeed the greatest value of this book is showing what a causal discovery can be, in all its complexity, as a computational and an inferential problem.

Other significant sections of the textbook are also devoted to the chief notion of partial identification - widely elaboretad by Manski, even thoug not openly mentioned - and partial correltation, a breath-of-fresh-air that always comes into play, because, no matter how unadulterated&perfect and informative data you have, very oftent it isn't yet enough to track some parameter down to an estimate-point value; what is feasible, on the other hand, is to charge some boundary conditions in order to see whether that parameter is at least theoretically identifiable (depending on the models one is willing to buy into and the gradient of ‘strenght’ of the assumptions one is willing to make).

This is, needless to say, tremendously important in non-natural science and policy making problems. The greatest the number of parameters, the less they can be identified with credible assumptions; however, the authors happen to show that some partial identification limits, which are based on not so hard assumptions, can exist and be formally explicit (there are a wide lot of examples from a corpus of traditional assumption parameters - for instance, linear homogeneous curves in economics or instrumental variables or the like, which relies on relatively hard assumptions and leaves virtually no space to uncertainty).

Reading it mostly for consultation, this was a rather scattered and incoherent appraisal. I confess.

Adaptive Computation and Machine Learning

Causation, Prediction, and Search

Peter Spirtes; Clark Glymour; Richard Scheines, Richard Scheines, Clark N. Glymour

About the author

Peter Spirtes; Clark Glymour; Richard Scheines

Ratings & Reviews

Friends & Following

Community Reviews

Join the discussion

Can't find what you're looking for?