Jump to ratings and reviews
Rate this book

Statistical Inference

Rate this book
This book builds theoretical statistics from the first principles of probability theory. Starting from the basics of probability, the authors develop the theory of statistical inference using techniques, definitions, and concepts that are statistical and are natural extensions and consequences of previous concepts. Intended for first-year graduate students, this book can be used for students majoring in statistics who have a solid mathematics background. It can also be used in a way that stresses the more practical uses of statistical theory, being more concerned with understanding basic statistical concepts and deriving reasonable statistical procedures for a variety of situations, and less concerned with formal optimality investigations.

535 pages, Hardcover

First published January 1, 2001

87 people are currently reading
1265 people want to read

About the author

George Casella

28 books5 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
174 (44%)
4 stars
135 (34%)
3 stars
62 (15%)
2 stars
11 (2%)
1 star
8 (2%)
Displaying 1 - 30 of 36 reviews
14 reviews
December 17, 2014
Good examples to motivate each concept, but it presupposes a strong understanding of multivariable calculus and FAMILIARITY WITH PROOFS. Either a good class on Real Analysis, or a good class involving mathematical proofs is a must. I used this book for a 2-semester sequence in Probability Theory and Statistical Inference, as part of an MS in Statistics.

Do EVERY example, and work through end of chapter problems as needed for your class, relevant to your own learning goals, or as tickles your fancy. The solutions manual is available courtesy of Google, and I highly recommend using it when you hit a wall. The level of mathematics here is one where you want to correct your intuition as quickly as possible if it's not right.
Profile Image for Enric Garriga Sànchez.
21 reviews
August 16, 2023
VERY well explained. Definitely better than my statistical inference teacher did. Managed to understand the concept of asymptotic variance of an estimator very well thanks to this book, among other things. 10/10. I'm also happy to say that I got an A on the final exam and as the final mark of the course🤙🤙
Profile Image for Yakov Zaytsev.
3 reviews
August 15, 2012
Better than "All of Statistics" for probability refresher e.g. has good explanation of the Monty Hall Problem :-)
Profile Image for Tinwerume.
87 reviews12 followers
January 12, 2022
I just did the exercises, and read what was required to do them. I don't really have an opinion on the quality of exposition, because by the time I came to this book I was already at least somewhat familiar with most of the content.

The exercises are quite good. It's pretty comprehensive, and I found it useful for shoring up the gaps in my knowledge of core statistics. I happily recommend it as a problem book, and it's probably not *terrible* for self-study if you're past intro statistics and want to learn some more advanced material.
Profile Image for Armelle Duston.
28 reviews
May 17, 2025
I laughed, I cried, I found meaning in the highest highs (exponential families) and the lowest lows (conditional hypothesis testing in the presence of nuisance parameters). Thanks, Casella and Berger, for the ride of a lifetime.
Profile Image for Daeus.
387 reviews3 followers
February 19, 2024
Neither great nor bad writing (I wish it had main takeaways at the end of each chapter), but a solid overview and progression from probability to statistics foundations (great framework to go over the subject matter). Very dense writing with proofs/theory, so I had to skim some whole sections. Granted, it was written moreso for graduate statistics coursework than to read generally.

Notes/quotes
- 'Possible methods of counting: with or without replacement, ordered vs unordered.' Unordered without replacement is so common it has its own notation: n choose r = n! / r!(n-r)!
- "probabilities should be determined by the sampling mechanism."
- "In many experiments it is easier to deal with a summary variable than with the original probability structure. For example, in an opinion poll, we might decide to ask 50 people whether they agree or disagree with a certain issue. If we record a "1" for agree and "0" for disagree, the sample space for this experiment has 2^50 elements, each an ordered string of 1s and 0s of length 50. We should be able to reduce this to a reasonable size! It may be that the only quantity of interest is the number of people who agree out of 50 and, if we define a variable X = number or 1s recorded out of 50, we have captured the essence of the problem. Not that the sample space for X is the set of integers {0,1,2,...,50} and is much easier to deal with than the original sample space." ... "A random variable is a function from a sample space S into the real numbers." More examples: 'experiment: toss two dice, random variable X: sum of the numbers. Experiment: toss a coin 25 times, random variable X: number of heads in 25 tosses.' "In defining a random variable, we have also defined a new sample space."
- "Statistical distributions are used to model populations; as such we usually deal with a family of distributions rather than a single distribution. The family is indexed by one or more parameters, which allow us to vary certain characteristics of the distribution wrote staying with one functional form. For example, we may specify that the normal distribution is a reasonable choice to model a particular population, but we cannot precisely specify the mean. Then we deal with a parametric family, a normal distribution with mean mu."
- "Theorem... If X and Y are independent random variables, then Cov(X,Y) = 0 and [correlation of x&y] = 0. Proof: Since X and Y are independent  from [an earlier Theorem]... we have EXY= (EX)(EY). Thus Cov(X,Y) = EXY - (EX)(EY) ... = 0. And corr xy = Cov(X,Y)/sigmax sigmay =...0." ... "If X and Y  are positively correlated (Cov(X,Y)>0), then the variation in X + Y is great than the sum of the variation in X and Y."
- "A linear combination of independent normal random variables is normally distributed."
- "[CLT:] Starting from virtually no assumptions (other than independence and finite variances, we end up with normality! The point here is that normality comes from sums of 'small (finite variance), independent disturbances. The assumption of finite variances is essentially necessary for convergence to normality."
- "A sufficient statistic for a parameter theta is a statistic that, in a certain sense, captures all the information about theta contained in the sample. Any additional information in the sample, besides the value of the sufficient statistic, does not contain any more information about theta.... [sufficiency principle in data reduction] that is, if x and y are two sample points such that T(x) = T(y), then the inference about theta should be the same where X=x or X=y is observed." ... "Thus, the likelihood principle [in data reduction] states that the same conclusion about mu should be drawn for any two sample points satisfying xbar = ybar." Ie if the sample means are the same, the mean is probably the same expectation (simplified). "Equivalence principles: if Y=g(X) is a change of measurement scale such that the model for Y has the same formal structure as the model for X, then an inference procedure should be both measurement evquivariant and formally equivarient." Ie some data in inches, some in meters, you can convert units (transform the data) and use the same process. "the Equivariance Principle is composed of two distinct types of equivariance. One type, measurement equivalence, is intuitively reasonable.... but the other principle, formal invariable, is quite different. It equates any two problems with the same mathematical structure, regardless of the physical reality they are trying tovexplain. It stays that one inference procedure is appropriate even if the physical realities are quite different, aj assumption that is sometimes difficult to justify."... "All three principles [sufficiency, likelihood, equivariance] prescribe relationships between inferences at different sample points, restricting the set of allowable inferences and, in this way, simplifying the analysis of the problem."
- "An estimator is a function of the sample, while an estimate is the realized value of the estimator (that is, a number) that we obtained when a sample is actually taken." Estimators: "method of moments is, perhaps, the oldest method of finding point estimatord" - eg taking the mean, or calculating the variance. Problem is that sometimes its negative, eg if the variance is higher than the mean. Satterthwaite approximation can help obtain an estimate of that is always positive. Another method/most common used: Maximum Likelihood Estimators (MLE). "The MLE is the parameter point for which the observed sample is the most likely.... the first problem is that of actually finding the global maximum and verifying that... the second problem is.... how sensitive is the estimate to small changes in the data?" Third method is Bayes estimator.
- "In the Bayesian approach theta is considered to be a quantity whose variation can be described by a probability distribution (called the prior distribution). This is a subjective distribution, based on the experiments belief, and is formulated before the data is seen (hence the name prior distribution)." ... "The Bayes estimator is, again, a linear combination of the prior and sample means. Notice also that [tau?]^2, the prior variance, is allowed to tend to infinity, the Bayes estimator tends towards the sample mean. We can interpret this as saying that, as the prior information becomes more vague, the Bayes estimator tends to give more weight to the sample information."
- The fourth and last estimator method is Expectation-Maximization (EM)" algorithm, which is too complicated... "based on the idea of replacing one difficult likelihood maximization with a sequence of easier maximization whose limit is the answer to the original problem. It is particularly suited to the 'missing data' problem."
- "The general topic of evaluating statistical procedures is part of the branch of statistics known as decision theory."... "we first investigate finite-sample measures of the quality of an estimator, beginning with its mean squares error [MSE]." ... " Thus, MSE incorporates two components, one measuring the variability of the estimation (precision), and the other measuring its bias (accuracy)." ..."Mean squared error is a special case of a function called a loss function. The study of performance, and the optimality, of estimates evaluated through loss functions is a branch of decision theory."
- "[Lehmann-Scheffe theorem] Unbiased estimators based on complete sufficient statistics are unique."
- "After a hypothesis test is done, the conclusions must be reported in some statistically meaningful way. One method of reporting the results of a hypothesis test is to report the size, alpha, of the test used and the decision go reject Ho or accept Ho.... if alpha is small, the decision to reject Ho is fairly convincing, but if alpha is large.. [it] is not very convincing because the test has a large probability of incorrectly making that decision. Another way of reporting the results of a hypothesis test is to report the value of a certain kind of test statistic called a p-value." ... "a p-value reports the results of a test on a more continuous scale, rather than just the dichotomous decision to 'Accept Ho' or 'Reject Ho.'"
- "we have carefully said that the interval [estimator] covers the parameter, not that the parameter is inside the internal.... we wanted to stress that the random quantity is the interval, not the parameter."
- "The use of decision theory in interval estimating problems is not a widespread as in point estimating or hypothesis testing problems."
- Huner (1981) robustness: "any statistical procedures should possess the following desirable features:
(1) It should have a reasonably good (optimal or near optimal) efficiency at the assumed model.
(2) If should be robust in the sense that small deviations from the model assumptions should impair the performance only slightly....
(3) Somewhat larger deviations from the model should not cause a catastrophe."
- "This insensitivity to extreme observations is sometimes considered an asset of the sample median.... the performance of the median improves in distributions with heavy tails."
- "A basic idea of the ANOVA, that of partitioning variation, is a fundamental idea of experimental statistics. The ANOVA belies its name in that it is not concerned with analyzing variances but rather with analyzing variation in means."
7 reviews
June 13, 2017
I've rarely seen a book as well put together as this one. Clear notations, well thought examples, clear proofs and unitary treatment of statistics. This is one of the top 10 math books I've read and statistics was not (past tense) one of my favorite subjects. This is the kind of book that can change your perspective about statistics. The author goes to great length to make the book self contained and when fundamental but rather difficult to prove results from calculus (or other math) are used, the author clearly states the results before using them (deriving under integral / etc).
Profile Image for Mihailo.
2 reviews
February 14, 2021
Although it covers probability only from the introductory point of view (it doesn't dive into measure theory at all, but it mentions some concepts related to it without explicitly stating it in terms of measure), it's an amazing book with main focus on statistics. All the concepts are clearly explained with a variety of examples and an appropriate intuition.
Profile Image for Jake Losh.
211 reviews24 followers
December 26, 2012
A great, comprehensive resource for all things statistical (excluding regression analysis; you can't do it all). Clear prose, though the examples could use some work. Not recommended for stats newbies.
Profile Image for Karla Acosta.
13 reviews
May 26, 2024
what a girl is capable of in her dissertation study rush lol

Great book. Almost always understood what it was explaining, but there were some snippets that not even on my most rested and intellectual level I would have been able to grasp. However I did chuckle at times, a proof that the author enjoyed writing this (and the reason why its confusing paragraphs are not enough to grant this book 4 stars). A thousand times better in the probability section that Ross (in my opinion)
9 reviews8 followers
September 30, 2020
I've read the book after I took a statistical inference class and the content made a lot more sense once I had a general idea about the topics. The explanation is this book is clearer than other statistical inference books.
Profile Image for Ben.
1 review1 follower
December 15, 2022
The "international" version of this book. Great and on-point explanations. After getting used to the notation and conventions, probably my favourite statistics book. Covers most of the basics, preparing for graduate-level stats.
1 review
August 20, 2017
It's a good text book but definitely not for everyone who is not majoring in stat related majors.
Profile Image for Endre Moen.
9 reviews
June 16, 2021
hard-core - rumors have it that the student solution is available.
Profile Image for Anthony DiGiovanni.
23 reviews6 followers
July 29, 2019
[More of a comment/note than a review, so far]

Haven't finished this yet, but I just have to commend it for addressing the elephant in the room of the concept of likelihood. Maybe I had just missed something in my formal stats classes before, but it had always confused me that we used the term "likelihood" for what seemed to just be the exact same thing as a joint pdf of a sample. And technically, yes, the likelihood and joint pdf are algebraically identical. But the latter is a function of the data, holding the parameter (theta) fixed, whereas the former is a function of the parameter, holding the data fixed. The usefulness of thinking of likelihood comes in when you compute likelihood ratios for different parameter values. Literally no other stats resource I've read on this topic has explained the difference so bluntly. The rest is Bayesian history.
2 reviews
November 29, 2021
Simultaneously the most important book and the book I hold the most seething contempt for that I read during my academic career.
Profile Image for owlette.
329 reviews5 followers
December 14, 2024
I feel pathetic for writing a book review about a statistics textbook on goodreads, but I don't know where else to post this, so here's a picture of me crying that nobody wants to listen to me talk how good this textbook is.
This is me with Casella and Berger (2002).

This is a killer textbook written with care and dedication (rare!) written in a way that I can actually understand (doubly rare!). I recommend this for anyone who has gone through a gauntlet of probability and statistics coursework in college and/or graduate school but needs a good textbook at home to brush up on stuff occasionally. I opened this book for the first time to (re)learn linear regression assumptions, and man, just read section 11.3 instead of those stupid Medium articles (or asking an AI chatbot that mine those articles to craft their answers). The attention to details is evident in the notes about notation: e.g. "The notation S_{xY} stresses the fact that S_{xY}is a random variable that is a function of the random variables Y_1, Y_2, ..., Y_n. S_{xY} also depends on the nonrandom quantities, x_1, x_2, ..., x_n." There are a lot of references to exercises, theorems, and lemmas from other chapters. What seals the deal for me are these lightbulb💡 moments that give clarity to the little things like "Why is it 'regress y on x' instead of 'regress x on y'?" or that the OLS estimates derived by minimizing residual sum of squares isn't statistical estimates yet because you can do the derivation without a statistical model.
Profile Image for Fleur_de_soie.
26 reviews7 followers
July 25, 2015
  Read this book because it is the text for our PhD Econometrics I course, also mainly because it is recommended by Professor D, so first comes his comments on the book.
  
  "The standard PhD level first text on Math. Stats in all serious stats and econ departments."
  
  Serious yes, from a point of view of a statistician, so really gives you a very good structure of the subject.
  
  Personally, believe that chapters before regression is really well written. The regression part may be too concise, OK, after all, that may be the task of a econometrics text. But I still believe that if the authors could explain the later chapters using matrix is would be more elegant and intuitive.
  
  I did not do the exercise and have a very weak stat background, which makes me rather difficult to follow the serious proofs and logics in later chapters.
  
  I would read it again very soon and do the exercises at the same time. Hope it would be better.
  
  After all, it is a very standard grad stat book for econ students. Harvard or MIT also use this one for their students at that level.
Profile Image for Cristina.
61 reviews
August 28, 2012
I like how theorems and corollaries are presented, but I'm not crazy about the lack of proofs and some of the examples. This isn't a book from which I could very easily self-teach myself statistics, as many of the examples tend to be the odd case. I don't have a book at this same level for comparison, so I can't judge this book relatively. Standard text used in the first series of graduate level theoretical statistics.
Profile Image for Omar.
10 reviews
December 29, 2014
Not bad but also not overly special. Self-instruction will be tough, but the solution manual may assist with that. Proofs are sometimes not provided which can make that harder. If taking as part of a formal course, make sure to engage the Professor or instructor on those more difficult matters.
Profile Image for Zeyuan Hu.
15 reviews1 follower
June 5, 2019
Problems are quite challenging. There are some parts of the question settings can be easily ignored and will be the key to solve problems. Explanations are quite clear but require a certain level of math sophistication. Definite clear my doubts about Bayesian inference.
Profile Image for Joseph Yao.
7 reviews
October 13, 2021
A rigorous, must-read book for anyone who wants to do research on statistical inference. Caution, don't attempt to finish all questions in the problem set, some of them are way way harder than you think!
Profile Image for Goo.
187 reviews
September 12, 2020
A fine introduction to the concepts of mathematical statistics.

Doesn't require measure theoretic probability, but it would probably help. Certainly the student must know what a probability measure is.
Displaying 1 - 30 of 36 reviews

Can't find what you're looking for?

Get help and learn more about the design.