More on this book
Kindle Notes & Highlights
Read between
September 9, 2020 - October 20, 2021
would be better to have material that helps you spot common mistakes and misunderstandings of p-values and tests in general, as all of us have to understand such things, even if we don’t use them.
Why aren’t the tests enough for research? The classical procedures of introductory statistics tend to be inflexible and fragile. By inflexible, I mean that they have very limited ways to adapt to unique research contexts. By fragile, I mean that they fail in unpredictable ways when applied to new contexts.
Even a procedure like ordinary linear regression, which is quite flexible in many ways, being able to encode a large diversity of interesting hypotheses, is sometimes fragile. For example, if there is substantial measurement error on prediction variables, then the procedure can fail in spectacular ways.
The point isn’t that statistical tools are specialized. Of course they are. The point is that classical tools are not diverse enough to handle many common research questions.
What researchers need is some unified theory of golem engineering, a set of principles for designing, building, and refining special-purpose statistical procedures. Every major branch of statistical philosophy possesses such a unified theory.
advanced—courses. So there are benefits in rethinking statistical inference as a set of strategies, instead of a set of pre-made tools.
Instead, we need some statistical epistemology, an appreciation of how statistical models relate to hypotheses and the natural mechanisms of interest.
How do we get a statistical model from a causal model? One way is to derive the expected frequency distribution of some quantity—a “statistic”—from the causal model.
(1) Any given statistical model (M) may correspond to more than one process model (P). (2) Any given hypothesis (H) may correspond to more than one process model (P). (3) Any given statistical model (M) may correspond to more than one hypothesis (H).
But falsification is alwaysconsensual, not logical.
Models can be made into testing procedures— all statistical tests are also models20—but they can also be used to design, forecast, and argue.
We want to use our models for several distinct purposes: designing inquiry, extracting information from data, and making predictions. In this book I’ve chosen to focus on tools to help with each purpose. These tools are: (1) Bayesian data analysis (2) Model comparison (3) Multilevel models (4) Graphical causal models
Things that can happen more ways are more plausible.
Once we have defined the statistical model, Bayesian data analysis forces a purely logical way of processing the data to produce inference.
The distribution of these measurements is called a sampling distribution.
So the sampling distribution of any measurement is constant, because the measurement is deterministic—there’s nothing “random” about it.
As a result, the field of image reconstruction and processing is dominated by Bayesian algorithms.24
And more complex models tend towards more overfitting than simple ones—the smarter the golem, the dumber its predictions.
Cross-validation and information criteria help us in three ways. First, they provide useful expectations of predictive accuracy, rather than merely fit to sample. So they compare models where it matters. Second...
This highlight has been truncated due to consecutive passage length restrictions.
What this means is that any particular parameter can be usefully regarded as a placeholder for a missing model.
This results in a model with multiple levels of uncertainty, each feeding into the next—a multilevel model.
What they do is exploit an amazing trick known as partial pooling that pools information across units in the data in order to produce better estimates for all units.
To adjust estimates for repeat sampling.
To adjust estimates for imbalance in sampling.
To avoid aver...
This highlight has been truncated due to consecutive passage length restrictions.
But the scope of multilevel modeling is much greater than these examples.
And some commonplace procedures, like the paired t-test, are really multilevel models in disguise.
These benefits don’t come for free, however. Fitting and interpreting multilevel models can be considerably harder than fitting and interpreting a traditional regression model. In practice, many researchers simply trust their black-box software and interpret multilevel regression exactly like single-level regression.
Models that are causally incorrect can make better predictions than those that are causally correct.
Successful prediction does not require correct causal identification.
that even when you don’t have a precise causal model, but only a heuristic one indicating which variables causally influence others, you can still do useful causal inference.
We require a causal model with which to design both the collection of data and the structure of our statistical models. But the construction of causal models is not a purely statistical endeavor, and statistical analysis can never verify all of our assumptions.
When tossing a causal salad, a model that makes good predictions may still mislead about causation. If we use the model to plan an intervention, it will get everything wrong.
All statistical modeling has these two frames: the small world of the model itself and the large world we hope to deploy the model in.38 Navigating between these two worlds remains a central challenge of statistical modeling. The challenge is greater when we forget the distinction.
But for human animals, Bayesian analysis provides a general way to discover relevant information and process it logically.
Bayesian data analysis usually means producing a story for how the data came to be. This story may be descriptive, specifying associations that can be used to predict outcomes, given observations. Or it may be causal, a theory of how some events produce other events.
all data stories are complete, in the sense that they are sufficient for specifying an algorithm for simulating new data.
A Bayesian model begins with one set of plausibilities assigned to each of these possibilities. These are the prior plausibilities. Then it updates them in light of the data, to produce the posterior plausibilities.
This updating process is a kind of learning, called Bayesian updating.
In contrast, Bayesian estimates are valid for any sample size. This does not mean that more data isn’t helpful—it certainly is. Rather, the estimates have a clear and valid interpretation, no matter the sample size. But the price for this power is dependency upon the initial plausibilities, the prior.
This is to say that your Bayesian machine guarantees perfect inference within the small world. No other way of using the available information, beginning with the same state of information, could do better.
No branch of applied mathematics has unfettered access to reality, because math is not discovered, like the proton. Instead it is invented, like the shovel.46
Unobserved variables are usually called parameters.
In conventional statistics, a distribution function assigned to an observed variable is usually called a likelihood.
We will be able to do things with our distributions that nonBayesian models forbid. So I will sometimes avoid the term likelihood and just talk about distributions of variables.
If your goal is to lie with statistics, you’d be a fool to do it with priors, because such a lie would be easily uncovered.
The posterior distribution takes the form of the probability of the parameters, conditional on the data.
This just says that the probability of W, L and p is the product of Pr(W, L|p) and the prior probability Pr(p).
Pr(p|W, L) = Pr(W, L|p) Pr(p) Pr(W, L)