Rate this book

Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan

Name: Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan
Rating: 4.36 (24 reviews)
ISBN: 9780124059160

John K. Kruschke

Rate this book

There is an explosion of interest in Bayesian statistics, primarily because recently created computational methods have finally made Bayesian analysis obtainable to a wide audience. Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan provides an accessible approach to Bayesian data analysis, as material is explained clearly with concrete examples. The book begins with the basics, including essential concepts of probability and random sampling, and gradually progresses to advanced hierarchical modeling methods for realistic data. Included are step-by-step instructions on how to conduct Bayesian data analyses in the popular and free software R and WinBugs. This book is intended for first-year graduate students or advanced undergraduates. It provides a bridge between undergraduate training and modern Bayesian methods for data analysis, which is becoming the accepted research standard. Knowledge of algebra and basic calculus is a prerequisite.

New to this Edition (partial list):

There are all new programs in JAGS and Stan. The new programs are designed to be much easier to use than the scripts in the first edition. In particular, there are now compact high-level scripts that make it easy to run the programs on your own data sets. This new programming was a major undertaking by itself. The introductory Chapter 2, regarding the basic ideas of how Bayesian inference re-allocates credibility across possibilities, is completely rewritten and greatly expanded. There are completely new chapters on the programming languages R (Ch. 3), JAGS (Ch. 8), and Stan (Ch. 14). The lengthy new chapter on R includes explanations of data files and structures such as lists and data frames, along with several utility functions. (It also has a new poem that I am particularly pleased with.) The new chapter on JAGS includes explanation of the RunJAGS package which executes JAGS on parallel computer cores. The new chapter on Stan provides a novel explanation of the concepts of Hamiltonian Monte Carlo. The chapter on Stan also explains conceptual differences in program flow between it and JAGS. Chapter 5 on Bayes’ rule is greatly revised, with a new emphasis on how Bayes’ rule re-allocates credibility across parameter values from prior to posterior. The material on model comparison has been removed from all the early chapters and integrated into a compact presentation in Chapter 10. What were two separate chapters on the Metropolis algorithm and Gibbs sampling have been consolidated into a single chapter on MCMC methods (as Chapter 7). There is extensive new material on MCMC convergence diagnostics in Chapters 7 and 8. There are explanations of autocorrelation and effective sample size. There is also exploration of the stability of the estimates of the HDI limits. New computer programs display the diagnostics, as well. Chapter 9 on hierarchical models includes extensive new and unique material on the crucial concept of shrinkage, along with new examples. All the material on model comparison, which was spread across various chapters in the first edition, in now consolidated into a single focused chapter (Ch. 10) that emphasizes its conceptualization as a case of hierarchical modeling. Chapter 11 on null hypothesis significance testing is extensively revised. It has new material for introducing the concept of sampling distribution. It has new illustrations of sampling distributions for various stopping rules, and for multiple tests. Chapter 12, regarding Bayesian approaches to null value assessment, has new material about the region of practical equivalence (ROPE), new examples of accepting the null value by Bayes factors, and new explanation of the Bayes factor in terms of the Savage-Dickey method.

GenresMathematicsScienceNonfictionTextbooksReferenceComputer ScienceProgramming

746 pages, Kindle Edition

Published November 11, 2014

147 people are currently reading

928 people want to read

About the author

John K. Kruschke

1 book5 followers

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

191 (51%)

4 stars

127 (34%)

3 stars

49 (13%)

2 stars

2 (<1%)

1 star

2 (<1%)

Displaying 1 - 24 of 24 reviews

Tobias Wolfram

29 reviews10 followers

September 23, 2016

A great introduction to a very important topic in statistics. When I started getting into Bayes unfortunately I did not choose this book because it looked a bit... weird with those puppies. But in fact this should really be no reason to stay away from it.

Marc Cooper

Author 3 books4 followers

June 28, 2018

Sometimes a book can be too wordy; as here. In addition, there's a lot of repetition – intended, presumably, to aid reinforcement – which I found more confusing than helpful. There's an art to concision, and this tome could do with some serious editing.

The exercises are good for reinforcement, although I think they would have provided more benefit within the text during the early chapters instead of concentrated at the end. (Also, the associated code is rather messy; not that R is the most expressive of languages.)

Part I (chapters 1 thru 5) provides a great start to the topic up to introducing Bayes' Rule. Things start to begin to wobble with Part II chapter 6, which introduces the Benoulli distribution and the Beta function. I had to go to another text to make sense of the content and what it was trying to impart. Someone with some familiarity of these topics might have been less befuddled. Similarly, chapter 7 on Markov Chain Monte Carlo started out well, then dumped a lot of info over more that 40 pages before the exercises offered some relief and an opportunity to try and make sense of what had come before.

The text has an odd mixture of writing. Sometimes it's verbose like an introductory book, then it makes unexplained leaps into mathematics with little or no explanation that leave you scratching your head.

Subsequent chapters to above (the remaining 550 pages!) made more use of code within the text, which was a marked improvement.

Overall, not a bad book by any means. Just too many words!

Heather

24 reviews1 follower

July 4, 2015

Chapter four and so far so good. Very clear. I'd tried a more technical book on this subject a few years ago (Gelman Carlin Stern and Rubin), and I found it tough going. It assumed you already knew about what it was talking about. This, book, on the other hand, is very discursive and has lots of illustrations, and I feel I'm understanding it. Probably at the end it'll be time to go back to Gelman at al. But meanwhile this is a great leg up!

Joe Cole

169 reviews352 followers

August 4, 2017

If you need to learn how to perform Bayesian statistics and understand it, then this is the book for you. I have been looking for a book like this for 3 years. I get Bayes Rule, but when I tried plunging into Gelman's excellent book, it just assumed too much and I stumbled around. This book was slightly basic for me, but the examples and exercises have been just right for helping things to click for me.

college-textbooks

Siobhan

5,050 reviews599 followers

April 25, 2018

Easily the best book detailing Bayesian statistics.

It’s easy to understand, explains everything well, and offers up so much about the topic. You do not need to be an expert in the field to understand this book, as this book takes you from the basics through to a more complex understanding.

Certainly worth reading if you want a better understanding.

Andy McKenzie

124 reviews26 followers

January 2, 2021

This is the best statistics textbook I've read, and I've read at least parts of ~ 10 of them. I've also read many tutorials/explanatory articles online, and this competes with the best of them. The text is exceptionally clear and even somewhat addictive, which I was not expecting from a statistics book. I can think of a few reasons for this. First, Kruschke motivates why you should care. For example, one of the canonical examples that he returns to often is coin flipping. Instead of assuming that you care about coin flipping, he explains why -- e.g. coin flipping can be thought of as estimating whether or not a treatment "works" in a clinical trial. Even though this explanation is totally obvious, it was still nice because it made me think that he cared about and respected my reading experience. Second, he is careful to repeat the key points after he gives an example, to tie the loop that so many other authors seem to consider beneath them. Finally, Kruschke is actually pretty funny. I scribbled "lol" on not a small number of pages of this book, due to the high hit-rate of his dry jokes. The exception to this are the poems. Do yourself a favor and skip the poems.

Even though it is a Bayesian book, for me its most helpful chapter was Chapter 15, which explains the General Linear Model. Other chapters that I found particularly helpful were Chapter 18, which explains shrinkage very nicely, and Chapter 9, which explains hierarchical models very well. One of the downsides of a book like this is how quickly the field is moving. For example, some of the gamma priors that Kruschke recommends are recommended against by other authors, which you can see if you read the Stan manual. But, there are a lot of fundamental principles in this book that will probably stand the test of time, so I'm expecting and hoping that investing time in it will pay dividends across the course of my career.

Quotes

"Bayesian model comparison compensates for model complexity by the fact that each model must have a prior distribution over its parameters, and more complex models must dilute their prior distributions over larger parameter spaces than simpler models. Thus, even if a complex model has some particular combination of parameter values that fit the data well, the prior probability of that particular combination must be small because the prior is spread thinly over the broad parameter space." - p. 290

"HMC instead uses a proposal distribution that changes depending on the current position. HMC figures out the direction in which the posterior distribution increases, called its gradient, and warps the proposal distribution toward the gradient." - p. 401

"The key to models of variable selection is that each predictor has both a regression coefficient and an inclusion indicator, which can be thought of as simply another coefficient that takes on the values 0 or 1. When the inclusion indicator is 1, then the regression coefficient has its usual role. When the inclusion indicator is 0, the predictor has no role in the model and the gression coefficient is superfluous.... A simple way to put a prior on the inclusion indicator is to have each indicator come from an independent Bernoulli prior, such as δj ~ dbern(0.5)." p. 537

SR Flashcards

q: define probability density

a: the probability of an outcome occuring in a particular interval divided by the width of that interval

q: define thinning (MCMC)

a: a method in which only every kth value in an MCMC chain is stored in memory
> a method for reduing autocorrelation that does not improve efficiency
> not recommended by Kruschke unless storing the full original chain or analyzing it subsequently would take too much memory

q: define Haldane prior

a: an uninformed beta distribution that gives large weight to outcomes where p = 0 or p = 1
> whereas beta(theta|1,1) is a more conventional uniform distribution
> makes sense when you are more likely to think that p = 0 or p = 1 vs p somewhere in the middle

q: define marginal likelihood

a: the operation of taking the average of the likelihood p(D|θ) across all values of θ, weighted by the prior probability of θ
> ie p(D) = sum_θ{p(D|θ)p(θ)}
> denominator in Bayes rule; aka evidence; p. 107 DBDA

q: define autocorrelation function

a: the autocorrelation across a spectrum of candidate lags

q: define effective sample size (MCMC)

a: divides the actual sample size in an MCMC draw by the amount of autocorrelation
> more autocorrelation -> less actual independent data from each draw
> e.g. to get the limits of an HDI, want around an ESS of around 10,000

q: define shrink factor (MCMC)

a: the ratio of the variance across independent MCMC chains to within MCMC chains, which is close to 1 when the chains have converged to the true distribution
> aka Gelamn-Rubin statistic
> ?gelman.diag in the coda R package

q: What is the general effect of shrinkage in hierarchical models? what specifies the degree of shrinkage?

a: to cause low-level parameters to shift towards the modes of the higher-level distribution; the relative strength of the lower and upper level parameters
> although shrinknage is a consequence of hierarchical model structure, not Bayesian estimation per se

q: What does a Bayes factor quantify?

a: how much the prior odds between two models change given the data

q: describe noise distribution

a: the distribution that describes the random variability of the data values around the underlying trend
> i.e., is usually at the bottom of a Kruschke model
> can differ; e.g., you could model the noise as normal, log-normal, gamma, etc...

q: In hierarchical models, if you want high precision estimates at the individual level, what do you need? if you want high precision estimates at the group level, what do you need?

a: lots of data within individuals, lots of individuals (without necessarily lots of data per individual, although more is better)
> p 382 DBDA

q: In Bayesian analysis, how is a nominal predictor used to predict values in a linear model?

a: for normal predictors, generally you predict a different beta for each possible level to quantify the corresponding "deflection from the mean"
> typically the baseline is constrained so that the deflecting sums to zero across the categories

q: In a general linear model, what happens to the predictor variables first? second?

a: they are combined, e.g. via addition; they are mapped to the predicted variable by a inverse link function
> p 435 DBDA

q: define inverse link function

a: the function that maps the combination of predictor variables to the data
> sometimes called the "link function" for convenience; called inverse for historical reasons

q: define logit

a: the inverse of the sigmoidal logistic function
> canonical link fxn for the Bernoulli distribution

q: define probit

a: the inverse of the cdf of the standard normal distribution, which is denoted as Φ(z), so the probit is denoted as Φ-1(p)
> canonical link function for the normal distribution
> probit stands for "probability unit"; Bliss 1934

q: If the link function in a GLM is the identity function, then what is the GLM equivalent to?

a: conventional linear regression
> Ch. 15 DBDA

q: If a distribution has higher kurtosis, what does that mean practically?

a: that it has heavier tails

q: A GLM can be written as follows:
μ = f(lin(x), [parameters])
y ~ pdf(μ, [parameters], where...
1) lin() = ?
2) f = ?
3) pdf = ?

a: 1) a linear function to combine the predictors x
2) the inverse link function
3) the noise distribution (going from predicted central tendency to noisy data)
> p 444 DBDA

q: define posterior predictive check

a: fitting the results of sampling from the distribution to the actual data

q: If two variables are highly correlated in multiple linear regression, what will that do to the posterior estimate of those coefficients?

a: it will make them very broad
> if there are three or more correlated predictors, pairwise scatterplots may not show it, but autocorrelation will remain high

q: define multiplicative interaction

a: when the predicted value is a weighted combination of both the individual predictors and the multiplicative product of those predictors
> a type of non-additivity
> DBDA p 525

q: define double exponential distribution

a: two exponential distributions glued together on each side of a threshold
> eg one exponential distribution on both +beta and -beta, equally spaced
> aka Laplace distribtuion

q: What is the etymology of ANOVA?

a: the ANOVA model posits that the total variance can be partitioned into within-group variance plus between-group variance, and since "analysis" means separation into constituent parts, the term ANOVA accurately describes the underlying algebra in the traditional methods

q: What is the "homogeneity of variance assumption" in ANOVA?

a: that the standard deviation of the data within each group is the same for all groups
> ANOVA also assumes that the data are normally distributed within groups

q: Can you discern posterior distributions of credible differences between parameters based on their marginal distributions? why or why not?

a: no; the parameters might be correlated, so you need to evaluate differences between jointly credible values, e.g. by taking the differences at each step of an MCMC chain

statistics

Risto Hinno

97 reviews2 followers

August 25, 2016

I cannot look at the t-test the same way. If you haven't heard of anything Bayesian data analysis, this should be your first book to read. If you already know Bayesian data analysis you should still read the book. It is a nice intro to Bayesian data analysis with detailed explanation and with practical examples (it is very rare to get both in one book). I think statistics courses should teach this stuff. Teaching only usual stuff - t-tests, linear regression (as I studied in my time) is crime against statistics. Those methods are usually used in a wrong way and they have many limitations. Bayesian data analysis might not be so simple as applying t-test in excel or R, but it gives much more information and has more solid "background" for real life data. And it has more "natural" ways to control false positive results. I liked the comparisons with "usual" methods. So read the book if you read this review. It is totally worth your time and money.

favorites

Terran M

78 reviews107 followers

March 22, 2018

This is my favorite introductory book on Bayesian data analysis. It's extremely accessible, taking you through both the theory of how and why to use Bayesian techniques, and the practical matters of using JAGS to run models.

John

40 reviews267 followers

December 3, 2010

The best text I have yet read on Bayesian data analysis.

statistics

Wojtekwalczak

16 reviews2 followers

August 30, 2020

Great and gentle introduction to Bayesian Data Analysis. The examples and exercises are down-to-earth and doable, which really makes the learning experience great. After reading this book I also read "A Student's Guide to Bayesian Statistics" and "Statistical Rethinking" by Richard McElreath and I found myself reusing the knowledge and experience I acquired while reading the DBDA book. Highly recommended to newcomers to Bayesian Data Analysis.

data-analysis

John Vivian

9 reviews1 follower

August 22, 2022

I've read most of the currently recommended Bayesian textbooks and this one is the best introduction to the subject for someone without a math background.

Kruschke is a fantastic and engaging author who makes a rather difficult subject understandable.

Shawn

82 reviews1 follower

May 7, 2017

Great introductory text to applied Bayesian analysis

data-science maths-stats programming

Kent Sibilev

50 reviews8 followers

March 16, 2021

It would be a good tutorial if this book had a better editor. This book should be at least four times shorter since the author likes to repeat and reiterate the same ideas over and over.

Eric Lawton

180 reviews12 followers

February 18, 2019

I don't think you could write a better book on this exciting topic.
I agree with its self-assessment on the skill level required of early-graduate student or final-year undergraduate. You need enough calculus to know what a differential or integral expression means but not how to evaluate it. The rest of the maths is what most call algebra but mathematicians call arithmetic, i.e. manipulating expressions with variables, not group theory or the like.
It is aimed at those who want to solve real-world problems on real-world data but I would add that it is good for those who want to understand what is going on under the covers. It is difficult to address all that in one book but I think this is as close as it gets.
It introduces all the concepts thoroughly, exploring them from multiple aspects, unlike some of the more “practical” books that skip almost all the maths and left me with less understand, or the more “mathematical” that left me with little ability to apply to other than toy data and still with less understanding as I couldn’t visualize.
Once you're past the basics, Kruschke illustrates each new concept with all the relevant points of
* Mathematical proof or solution, though often in only a specialized case when the general case is too complex and doesn’t add real understanding, but he chooses a case that does provide understanding
* More informal explanation of what it means and why it is plausible and useful
* Set of plots showing main parameters varying from the extremes through whatever kind of average is useful. Not just the posterior distribution, but prior, likelihood or whatever is relevant to understanding. This really helped me visualize both why a technique works and where/why it breaks down.
* Worked examples
* Computer code in R, JAGS (updated BUGS) and Stan (more efficient than BUGS, for larger/more complex data sets)
* Exercises for the reader (many with full solutions on the book’s web site, handy for self-study).
He also explains how the Bayesian analysis relates to “traditional” null-hypothesis significance tests, which is helpful either if you know them well or you need to relate your findings to those who do.
For the computer code, you do get enough of an intro to understand the code in the book, but you’ll need other sources to become a proficient coder.
I read a few books of this kind (graduate-level textbooks). I find some of them hard to read because they attempt to be too “friendly”, complete with several exclamation marks per page in acknowledgement of their own feeble attempts at humour! And personal anecdotes! This one doesn’t. I found the language straightforward for such a technical work and very clear. The cuteness is confined to a short poem at the beginning of each chapter.

It took me a long time because it dropped off my priority list for reasons not to do with the book, but also because it's worth doing lots of the exercises.

maths

Tom Schulte

3,469 reviews77 followers

August 5, 2016

Both textbook and practical guide, this work is an accessible approach to Bayesian data analysis from the basics. Chapter-length explorations of various implementations make this an effective reference for non-expert practitioners that seek to bring the value of Bayesian analysis to problems in their field. Intended for first-year graduate students or advanced undergraduates, this book offers thorough training on modern Bayesian methods for data analysis. Algebra and basic calculus, nothing really beyond simple integration, are prerequisites to maximizing understanding of the theory presented in the primer, but no more mathematics than that. This book features throughout implementations in the programming language R and software packages JAGS and Stan. A comfort level with basic computer programming will bring out the greatest value here. Certainly, someone looking to create a first-time R application featuring Bayesian analysis will be hard pressed to find a better text resource to contribute toward success. As a textbook, the basics of probability and random sampling are effectively covered. Each chapter has a few, generally less than ten, exercises. Many are new or revised and they are thought-provoking, multi-step applications. They have explicit purposes (“Transformed parameters in Stan, and comparison with JAGS”) and guidelines for accomplishment. In a classroom setting, I think further contouring of exercises would be required to level-set expectations.
...

[Look for my entire review up at MAA Reviews.]

51x-maa-reviews

Michael Culbertson

205 reviews4 followers

January 27, 2016

A fabulous (and thorough!) entry-level text on applied Bayesian data analysis (very likely the best intro text currently available). No background in statistics is strictly required, though students familiar with the basics (means, standard deviations, probability distributions, linear models, etc.) will have a bit of an easier time with the material. The presentation does make use of calculus, but the gist can be understood without it, and Kruschke is careful to walk through any derivations in detail. Kruschke injects humor and an infectious enthusiasm for Bayesian analysis (particularly in the early chapters, where students are perhaps most likely to get discouraged). Kruschke makes good use of repetition to reinforce key points throughout the book, with plenty of cross references to remind readers of what has gone before (or to foreshadow a discussion to come) and point out where to find more. The last part of the book provides examples for a large variety of model types, so much so that students who catch on quickly may find the examples to be a little too tedious. But, I think it's worth having all this material for the students who need some assistance in generalizing the earlier concepts. Highly recommended. I will likely be drawing on Kruschke if ever I find myself teaching a course in applied Bayesian data analysis.

Ron

9 reviews1 follower

November 13, 2016

Fantastic deep introduction to Bayesian techniques for data analysis. Really everything you need, and they teach you the tools you need as well. They use R as the data analysis language, which is very easy to use and you can easily translate the examples to Python or whatever your favorite language for data analysis might be.

Alexander Whyte

10 reviews

August 29, 2020

This is a very good book. One thing to watch out for is that the hierarchical models do not work with OpenBugs/BRugs. I have an old version of the book, so I'm not sure if newer versions already address this. They work with JAGS/rjags however.

It is very well written. The explanations are very clear.

Zach

11 reviews1 follower

February 3, 2017

Easily the best overview of MCMC modeling I've come across. Really never read a textbook that was so relatable while not dumbing down the material. One of the best statistics books I've read in years!

Jerzy

571 reviews139 followers

July 19, 2013

I'm coming from the point of view of someone who already does Bayesian statistics and wanted a textbook to recommend to others who'd also like to learn. This is the best one I've seen, by far.

statistics

3 reviews

June 12, 2013

Clear and utterly fascinating. The best thing I can say after reading this book is that Bayesian analysis indeed makes sense!

statistics

Sean Martin

157 reviews4 followers

May 13, 2015

Well written and clear intro to some fairly difficult topics. Manages to bring up some of the more challenging technical aspects without overwhelming the reader with them.