Jay Ma’s Kindle Notes & Highlights for Statistical Rethinking: A Bayesian Course with Examples in R and STAN (Chapman & Hall/CRC Texts in Statistical Science)

loc. 852009

would be better to have material that helps you spot common mistakes and misunderstandings of p-values and tests in general, as all of us have to understand such things, even if we don’t use them.

loc. 1442106

Why aren’t the tests enough for research? The classical procedures of introductory statistics tend to be inflexible and fragile. By inflexible, I mean that they have very limited ways to adapt to unique research contexts. By fragile, I mean that they fail in unpredictable ways when applied to new contexts.

loc. 1442256

Even a procedure like ordinary linear regression, which is quite flexible in many ways, being able to encode a large diversity of interesting hypotheses, is sometimes fragile. For example, if there is substantial measurement error on prediction variables, then the procedure can fail in spectacular ways.

loc. 1442327

The point isn’t that statistical tools are specialized. Of course they are. The point is that classical tools are not diverse enough to handle many common research questions.

loc. 1507356

What researchers need is some unified theory of golem engineering, a set of principles for designing, building, and refining special-purpose statistical procedures. Every major branch of statistical philosophy possesses such a unified theory.

loc. 1507401

advanced—courses. So there are benefits in rethinking statistical inference as a set of strategies, instead of a set of pre-made tools.

loc. 1507586

Instead, we need some statistical epistemology, an appreciation of how statistical models relate to hypotheses and the natural mechanisms of interest.

loc. 1638547

How do we get a statistical model from a causal model? One way is to derive the expected frequency distribution of some quantity—a “statistic”—from the causal model.

loc. 1638693

(1) Any given statistical model (M) may correspond to more than one process model (P). (2) Any given hypothesis (H) may correspond to more than one process model (P). (3) Any given statistical model (M) may correspond to more than one hypothesis (H).

loc. 1900574

But falsification is alwaysconsensual, not logical.

loc. 1900710

Models can be made into testing procedures— all statistical tests are also models20—but they can also be used to design, forecast, and argue.

loc. 1900866

We want to use our models for several distinct purposes: designing inquiry, extracting information from data, and making predictions. In this book I’ve chosen to focus on tools to help with each purpose. These tools are: (1) Bayesian data analysis (2) Model comparison (3) Multilevel models (4) Graphical causal models

loc. 1901076

Things that can happen more ways are more plausible.

loc. 1966091

Once we have defined the statistical model, Bayesian data analysis forces a purely logical way of processing the data to produce inference.

loc. 1966223

The distribution of these measurements is called a sampling distribution.

loc. 1966494

So the sampling distribution of any measurement is constant, because the measurement is deterministic—there’s nothing “random” about it.

loc. 1966549

As a result, the field of image reconstruction and processing is dominated by Bayesian algorithms.24

loc. 2097647

Complex models often make worse predictions than simpler models.

Simple models are better. Because of over fitting.

loc. 2097691

And more complex models tend towards more overfitting than simple ones—the smarter the golem, the dumber its predictions.

loc. 2097736

Cross-validation and information criteria help us in three ways. First, they provide useful expectations of predictive accuracy, rather than merely fit to sample. So they compare models where it matters. Second...

This highlight has been truncated due to consecutive passage length restrictions.

loc. 2163049

What this means is that any particular parameter can be usefully regarded as a placeholder for a missing model.

loc. 2163091

This results in a model with multiple levels of uncertainty, each feeding into the next—a multilevel model.

loc. 2163224

What they do is exploit an amazing trick known as partial pooling that pools information across units in the data in order to produce better estimates for all units.

loc. 2228231

To adjust estimates for repeat sampling.

loc. 2228258

To adjust estimates for imbalance in sampling.

loc. 2228315

To avoid aver...

This highlight has been truncated due to consecutive passage length restrictions.

loc. 2228423

But the scope of multilevel modeling is much greater than these examples.

loc. 2228473

And some commonplace procedures, like the paired t-test, are really multilevel models in disguise.

loc. 2228667

These benefits don’t come for free, however. Fitting and interpreting multilevel models can be considerably harder than fitting and interpreting a traditional regression model. In practice, many researchers simply trust their black-box software and interpret multilevel regression exactly like single-level regression.

loc. 2294026

Models that are causally incorrect can make better predictions than those that are causally correct.

loc. 2294164

Successful prediction does not require correct causal identification.

loc. 2359365

that even when you don’t have a precise causal model, but only a heuristic one indicating which variables causally influence others, you can still do useful causal inference.

loc. 2359536

We require a causal model with which to design both the collection of data and the structure of our statistical models. But the construction of causal models is not a purely statistical endeavor, and statistical analysis can never verify all of our assumptions.

loc. 2359794

When tossing a causal salad, a model that makes good predictions may still mislead about causation. If we use the model to plan an intervention, it will get everything wrong.

loc. 2490628

All statistical modeling has these two frames: the small world of the model itself and the large world we hope to deploy the model in.38 Navigating between these two worlds remains a central challenge of statistical modeling. The challenge is greater when we forget the distinction.

loc. 2556244

But for human animals, Bayesian analysis provides a general way to discover relevant information and process it logically.

loc. 3080656

Bayesian data analysis usually means producing a story for how the data came to be. This story may be descriptive, specifying associations that can be used to predict outcomes, given observations. Or it may be causal, a theory of how some events produce other events.

loc. 3080723

all data stories are complete, in the sense that they are sufficient for specifying an algorithm for simulating new data.

loc. 3146062

A Bayesian model begins with one set of plausibilities assigned to each of these possibilities. These are the prior plausibilities. Then it updates them in light of the data, to produce the posterior plausibilities.

loc. 3146096

This updating process is a kind of learning, called Bayesian updating.

loc. 3276995

In contrast, Bayesian estimates are valid for any sample size. This does not mean that more data isn’t helpful—it certainly is. Rather, the estimates have a clear and valid interpretation, no matter the sample size. But the price for this power is dependency upon the initial plausibilities, the prior.

loc. 3277116

This is to say that your Bayesian machine guarantees perfect inference within the small world. No other way of using the available information, beginning with the same state of information, could do better.

loc. 3342641

No branch of applied mathematics has unfettered access to reality, because math is not discovered, like the proton. Instead it is invented, like the shovel.46

loc. 3342893

Unobserved variables are usually called parameters.

loc. 3408121

In conventional statistics, a distribution function assigned to an observed variable is usually called a likelihood.

loc. 3408146

We will be able to do things with our distributions that nonBayesian models forbid. So I will sometimes avoid the term likelihood and just talk about distributions of variables.

loc. 3604558

If your goal is to lie with statistics, you’d be a fool to do it with priors, because such a lie would be easily uncovered.

loc. 3604957

The posterior distribution takes the form of the probability of the parameters, conditional on the data.

loc. 3670067

This just says that the probability of W, L and p is the product of Pr(W, L|p) and the prior probability Pr(p).

loc. 3670241

Pr(p|W, L) = Pr(W, L|p) Pr(p) Pr(W, L)