Statistical Rethinking: A Bayesian Course with Examples in R and STAN (Chapman & Hall/CRC Texts in Statistical Science)
Rate it:
Kindle Notes & Highlights
And this is Bayes’ theorem. It says that the probability of any particular value of p, considering the data, is equal to the product of the relative plausibility of the data, conditional on p, and the prior plausibility of p, divided by this thing Pr(W, L), which I’ll call the average probability of the data. In word form: Posterior = Probability of the data × Prior Average probability of the data
Averaged over what? Averaged over...
This highlight has been truncated due to consecutive passage length restrictions.
It’s job is just to standardize the posterior, to ensure it sums ...
This highlight has been truncated due to consecutive passage length restrictions.
The key lesson is that the posterior is proportional to the product of the prior and the probability of the data. Why? Because for each specific value of p, the number of paths through the garden of forking data is the product of the prior number of paths and the new number of paths. Multiplication is just compressed counting. ...
This highlight has been truncated due to consecutive passage length restrictions.
So while Bayes’ theorem looks complicated, because the relationship with counting paths is obscured, it just expresses...
This highlight has been truncated due to consecutive passage length restrictions.
at its heart lies a motor that processes data, producing a posterior distribution.
The action of this motor can be thought of asconditioning the prior on the data.
Quadratic approximation (3) Markov chain Monte Carlo (MCMC)
A Gaussian approximation is called “quadratic approximation” because the logarithm of a Gaussian distribution forms a parabola. And a parabola is a quadratic function. So this approximation essentially represents any log-posterior with a parabola.
We’ll use quadratic approximation for much of the first half of this book. For many of the most common procedures in applied statistics—linear regression, for example—the approximation works very well. Often, it is even exactly correct, not actually an approximation at all. Computationally, quadratic approximation is very inexpensive, at least compared to grid approximation and MCMC (discussed next). The procedure, which R will happily conduct at your command, contains two steps.
quap. We’re going to be using quap a lot in the first half of this book. It’s a flexible model fitting tool that will allow us to specify a large number of different “regression” models.
I’ll use the analytical approach here, which uses dbeta. I won’t explain this calculation, but it ensures that we have exactly the right answer.
of parameters. Grid approximation routinely fails here, because it just takes too long—the Sun will go dark before your computer finishes the grid. Special forms of quadratic approximation might work, if everything is just right. But commonly, something is not just right. Furthermore, multilevel models do not always allow us to write down a single, unified function for the posterior distribution. This means that the function to maximize (when finding the MAP) is not known, but must be computed in pieces. As a result, various counterintuitive model fitting techniques have arisen. The most ...more
This highlight has been truncated due to consecutive passage length restrictions.
« Prev 1 2 Next »