What do

*you*think?Rate this book

Going beyond the conventional mathematics of probability theory, this study views the subject in a wider context. It discusses new results, along with applications of probability theory to a variety of problems. The book contains many exercises and is suitable for use as a textbook on graduate-level courses involving data analysis. Aimed at readers already familiar with applied mathematics at an advanced undergraduate level or higher, it is of interest to scientists concerned with inference from incomplete information.

753 pages, Hardcover

First published April 9, 1999

Edwin Thompson Jaynes was the Wayman Crow Distinguished Professor of Physics at Washington University in St. Louis. He wrote extensively on statistical mechanics and on foundations of probability and statistical inference, initiating in 1957 the MaxEnt interpretation of thermodynamics, as being a particular application of more general Bayesian/information theory techniques (although he argued this was already implicit in the works of Gibbs). Jaynes strongly promoted the interpretation of probability theory as an extension of logic.

In 1963, together with Fred Cummings, he modelled the evolution of a two-level atom in an electromagnetic field, in a fully quantized way. This model is known as the Jaynes–Cummings model.

A particular focus of his work was the construction of logical principles for assigning prior probability distributions; see the principle of maximum entropy, the principle of transformation groups and Laplace's principle of indifference. Other contributions include the mind projection fallacy.

Jaynes' posthumous book, Probability Theory: The Logic of Science (2003) gathers various threads of modern thinking about Bayesian probability and statistical inference, develops the notion of probability theory as extended logic, and contrasts the advantages of Bayesian techniques with the results of other approaches. This book was published posthumously in 2003 (from an incomplete manuscript that was edited by Larry Bretthorst). An unofficial list of errata is hosted by Kevin S. Van Horn.

In 1963, together with Fred Cummings, he modelled the evolution of a two-level atom in an electromagnetic field, in a fully quantized way. This model is known as the Jaynes–Cummings model.

A particular focus of his work was the construction of logical principles for assigning prior probability distributions; see the principle of maximum entropy, the principle of transformation groups and Laplace's principle of indifference. Other contributions include the mind projection fallacy.

Jaynes' posthumous book, Probability Theory: The Logic of Science (2003) gathers various threads of modern thinking about Bayesian probability and statistical inference, develops the notion of probability theory as extended logic, and contrasts the advantages of Bayesian techniques with the results of other approaches. This book was published posthumously in 2003 (from an incomplete manuscript that was edited by Larry Bretthorst). An unofficial list of errata is hosted by Kevin S. Van Horn.

Create a free account to discover what your friends think of this book!

Displaying 1 - 23 of 23 reviews

January 14, 2018

Folks who follow me on Twitter know this is essentially my 2nd bible. (Yes, the first one is The Bible.)

There's really no way to delve into that other than to recapitulate the book, but let me just hammer one point, which I take to be central, home: good old-fashioned Aristotelian two-valued logic is a*special case* of probability theory properly understood. Conversely, probability theory properly understood is a *generalization* of good old-fashioned Aristotelian two-valued logic.

Jaynes makes no claims to originality here—he fully credits the insight to Richard Cox, although Jaynes sees a consistent thread all the way back to Bayes and especially Laplace—but Jaynes struggled for decades to overcome the loss of this insight in the domination of the frequentist approach to statistics that occurred essentially immediately after Laplace's death, but especially in the rise of the Neyman-Pearson era. In fact, when the smoke clears, Jaynes dispenses with the actual theory of probability in the first two chapters. Literally the remainder of the book is elaboration; extensive exploration of the construction of prior probabilities by means of marginalization, transformation groups, or the principle of maximum entropy, which Jaynes humbly declines to point out constitutes his original contribution; and admittedly extremely sharp criticisms of said Neyman-Pearson orthodox statistics.

As a software developer with a life-long interest in artificial intelligence and amateur mathematician who always found his minimal exposure to statistics entirely baffling, the book "is like lightning from a clear sky," to appropriate C.S. Lewis' wonderful review of "The Fellowship of the Ring." It's not only possible, but crucial, to base probability theory on logic and information theory, and when you do so, whole new worlds of application reveal themselves. In a very real sense, it isn't that this is the best book on the subject. It's that it's the*only* book on the subject.

There's really no way to delve into that other than to recapitulate the book, but let me just hammer one point, which I take to be central, home: good old-fashioned Aristotelian two-valued logic is a

Jaynes makes no claims to originality here—he fully credits the insight to Richard Cox, although Jaynes sees a consistent thread all the way back to Bayes and especially Laplace—but Jaynes struggled for decades to overcome the loss of this insight in the domination of the frequentist approach to statistics that occurred essentially immediately after Laplace's death, but especially in the rise of the Neyman-Pearson era. In fact, when the smoke clears, Jaynes dispenses with the actual theory of probability in the first two chapters. Literally the remainder of the book is elaboration; extensive exploration of the construction of prior probabilities by means of marginalization, transformation groups, or the principle of maximum entropy, which Jaynes humbly declines to point out constitutes his original contribution; and admittedly extremely sharp criticisms of said Neyman-Pearson orthodox statistics.

As a software developer with a life-long interest in artificial intelligence and amateur mathematician who always found his minimal exposure to statistics entirely baffling, the book "is like lightning from a clear sky," to appropriate C.S. Lewis' wonderful review of "The Fellowship of the Ring." It's not only possible, but crucial, to base probability theory on logic and information theory, and when you do so, whole new worlds of application reveal themselves. In a very real sense, it isn't that this is the best book on the subject. It's that it's the

May 13, 2016

A "frequentist," according Jaynes, is someone who believes in random variables. That would be just about anyone who uses probability theory, right? "No," Jaynes would say. It's anyone who uses orthodox probability theory. The alternative, espoused here, is to consider probability as a measurement for propositions about reality. I'm afraid that I'm not going to be able to explain it any better than that, but if you read the first two chapters of this book, you will concede that it's a neat idea. In the remainder of the book, Jaynes argues that it's more than a neat idea.

Jaynes does not take credit for this so-called "Bayesian" formulation of probability theory (that belongs to Laplace, Cox, and Jeffreys), but its history and implications are certainly codified in this, his magnum opus, published posthumously in 2002. If you did not find your name in the previous sentence, then Jaynes has nothing but scorn for you. But don't worry. He saves the bulk of his wrath for physicists who believe in the probabilistic nature of quantum states. (Amongst the targets of Jaynes's scorn are "idiots" who perform statistical tests on isolated hypotheses. Consequently, I wonder what alternative Jaynes prefers to the "stupid" Copenhagen interpretation of quantum mechanics. He doesn't say.)

The impetus for Jaynes's screed is revealed in Chapter 15. In 1973 some frequentists mistakenly claimed to have found an inconsistency in Bayesian probability theory. Rather than simply point out their mistake, Jaynes elects the nuclear option and asserts that "frequentist thought" necessarily leads to insoluable paradoxes. To support this allegation, Jaynes frequently and, I suspect, intentionally confuses the opinions of a specific frequentist with frequentism. It gets tiresome.

Alhough there is a lot of overlap between frequentist and Bayesian terminology, there are some confusing differences. For example, at the end of Chapter 4 (p. 113), Jaynes says that the standard deviation shrinks "proportional to 1/Sqrt[N]." A frequentist, on the other hand would say it is the standard *error* that shrinks "proportional to 1/Sqrt[N]." Jaynes never uses the latter term.

Another, possibly more arcane example has to do with noise. I think Jaynes would have approved of my constant reminder to students that "noise" is any aspect of our data that we elect to model probabilistically. However, he would not approve of my description of that noise as the result of a stochastic process. There are no such things, according to Jaynes. Instead, he would specify a prior for noise values. At present, I don't see that it makes any mathematical difference.

Jaynes does assemble a compelling case for considering prior probability distributions whenever a parameter value must be estimated. In certain cases, not considering the appropriate prior can be tantamount to implicitly accepting an implausibly restrictive one, and this can lead to ridiculousness. Moreover, even if a frequentist does use a prior probability distribution in his parameter estimates, by definition, it would have to integrate to 1. On the other hand, there is nothing in the Bayesian framework that prohibits the use of so-called improper priors. These, like 1/x for 0

I was also very pleased to see Jaynes expose one bit of mathematical trickery that I've seen repeatedly perpetrated by youtube number theorists. They perform algebra on infinite sets. Jaynes convincingly demonstrates that anything can be proven when you do that. Taking the limit (e.g. as n goes to infinity) is necessarily the last step in any sensible derivation. (NB: Leaving the limit for last does not guarantee sensibility!)

Here are some good sentences:

1) We do not deny the existence of other definitions which do include nondifferentiable functions, any more than we deny the existence of fluorescent purple hair dye in England; in both cases, we simply have no use for them.

2) An honest man can maintain an ideology only as long as he confines himself to problems where its shortcomings are not evident.

3) In any field, the most reliable and instantly recognizable sign of a fanatic is a lack of any sense of humor. (Jaynes's humourless ideologue is Ronald "F Test" Fisher.)

I know yet neither if nor how these ideas will affect my work. Certainly, my undergraduate lectures on probability theory will be tweaked. Students need to know that there is more than one way to define probability. However, they also need to be able to calculate (what I will continue to call) standard error, and they will need to understand scientific papers that adopt frequentist terminology. Will I refrain from describing the number of spots on top of one tossed die as random? Probably not.

From the papers I've written, it's clear that "generalised" likelihood ratios form the basis of my favourite statistical tests. A frequentist typically compares these ratios with values from a chi-square distribution, with a certain number of degrees-of-freedom. A Bayesian, on the other hand, would simply report the ratio, multiplied by that same number. I still don't really understand why it's the same number. Jaynes explains the number with priors (natch). Frequentists must have a very different explanation, but it seems to be beyond the scope of my (graduate level) textbook on the theory of statistics.

Jaynes does not take credit for this so-called "Bayesian" formulation of probability theory (that belongs to Laplace, Cox, and Jeffreys), but its history and implications are certainly codified in this, his magnum opus, published posthumously in 2002. If you did not find your name in the previous sentence, then Jaynes has nothing but scorn for you. But don't worry. He saves the bulk of his wrath for physicists who believe in the probabilistic nature of quantum states. (Amongst the targets of Jaynes's scorn are "idiots" who perform statistical tests on isolated hypotheses. Consequently, I wonder what alternative Jaynes prefers to the "stupid" Copenhagen interpretation of quantum mechanics. He doesn't say.)

The impetus for Jaynes's screed is revealed in Chapter 15. In 1973 some frequentists mistakenly claimed to have found an inconsistency in Bayesian probability theory. Rather than simply point out their mistake, Jaynes elects the nuclear option and asserts that "frequentist thought" necessarily leads to insoluable paradoxes. To support this allegation, Jaynes frequently and, I suspect, intentionally confuses the opinions of a specific frequentist with frequentism. It gets tiresome.

Alhough there is a lot of overlap between frequentist and Bayesian terminology, there are some confusing differences. For example, at the end of Chapter 4 (p. 113), Jaynes says that the standard deviation shrinks "proportional to 1/Sqrt[N]." A frequentist, on the other hand would say it is the standard *error* that shrinks "proportional to 1/Sqrt[N]." Jaynes never uses the latter term.

Another, possibly more arcane example has to do with noise. I think Jaynes would have approved of my constant reminder to students that "noise" is any aspect of our data that we elect to model probabilistically. However, he would not approve of my description of that noise as the result of a stochastic process. There are no such things, according to Jaynes. Instead, he would specify a prior for noise values. At present, I don't see that it makes any mathematical difference.

Jaynes does assemble a compelling case for considering prior probability distributions whenever a parameter value must be estimated. In certain cases, not considering the appropriate prior can be tantamount to implicitly accepting an implausibly restrictive one, and this can lead to ridiculousness. Moreover, even if a frequentist does use a prior probability distribution in his parameter estimates, by definition, it would have to integrate to 1. On the other hand, there is nothing in the Bayesian framework that prohibits the use of so-called improper priors. These, like 1/x for 0

I was also very pleased to see Jaynes expose one bit of mathematical trickery that I've seen repeatedly perpetrated by youtube number theorists. They perform algebra on infinite sets. Jaynes convincingly demonstrates that anything can be proven when you do that. Taking the limit (e.g. as n goes to infinity) is necessarily the last step in any sensible derivation. (NB: Leaving the limit for last does not guarantee sensibility!)

Here are some good sentences:

1) We do not deny the existence of other definitions which do include nondifferentiable functions, any more than we deny the existence of fluorescent purple hair dye in England; in both cases, we simply have no use for them.

2) An honest man can maintain an ideology only as long as he confines himself to problems where its shortcomings are not evident.

3) In any field, the most reliable and instantly recognizable sign of a fanatic is a lack of any sense of humor. (Jaynes's humourless ideologue is Ronald "F Test" Fisher.)

I know yet neither if nor how these ideas will affect my work. Certainly, my undergraduate lectures on probability theory will be tweaked. Students need to know that there is more than one way to define probability. However, they also need to be able to calculate (what I will continue to call) standard error, and they will need to understand scientific papers that adopt frequentist terminology. Will I refrain from describing the number of spots on top of one tossed die as random? Probably not.

From the papers I've written, it's clear that "generalised" likelihood ratios form the basis of my favourite statistical tests. A frequentist typically compares these ratios with values from a chi-square distribution, with a certain number of degrees-of-freedom. A Bayesian, on the other hand, would simply report the ratio, multiplied by that same number. I still don't really understand why it's the same number. Jaynes explains the number with priors (natch). Frequentists must have a very different explanation, but it seems to be beyond the scope of my (graduate level) textbook on the theory of statistics.

*sigh* Why do I love adding math books I'm most likely not smart enough to understand?

July 18, 2016

“Our theme is simply: probability theory as extended logic. The ‘new’ perception amounts to the recognition that the mathematical rules of probability theory are not merely rules for calculating frequencies of ‘random variables’; they are also the unique consistent rules for conducting inference (i.e. plausible reasoning) of any kind, and we shall apply them in full generality to that end.” - E.T. Jaynes’

As an undergraduate in computer science, I left my statistics course with disdain. The curriculum required the course to be taught with a rigid tradition and formalism that obscured the mathematics. After reading Jaynes’ book I’m given the opinion by a professional statistician, at least in the realm of statistical physics, that there was much more going in the background that I should be pissed off about.

I was speaking with an acquaintance once, who happens to be a math major, and statistics came up. She ended up spurting out, “well statistics isn’t real math” and we both nodded in unison. I wouldn’t understand how wrong I was until I read this book. The usual role a professional is to demand for complexity, arguing that their discipline cannot be reduced to easily repeated formulae. What Jaynes’ shows, by looking back to Aristotle, Boole, LaPlace, etc is that some things are simple and ought to not be obfuscated.

While simple, this subject is not easy and I took full advantage of reading by Audrey Clayborn alongside this text. Aimed at graduate students and practitioners of applied mathematics in the other sciences like physics and biology, this book was written to take the wind out of the psychosomatic trap statisticians have created for themselves and it does that to my understanding. I lucked out by knowing enough combinatorics to make the algebra manageable and enough differential equations/calculus to understand the proofs. But to anyone like myself that may be an undergraduate I would recommend following along with Audrey if anything just to get a second perspective on what you’re reading because it is not an easy subject.

"Often times applied statistics, when pseudorandomness is involved, the question that is difficult is not 'Can you find a needle in a haystack?' It is: 'Can you find hay in a haystack?' You keep going to the haystack and finding needles and you realize it's not a trivial problem." ~ Avi Wigderson

I skipped chapter ten, eighteen and nineteen just due to my lack of time for now. In my personal opinion, teaching the thought patterns to working and dealing with data is as important as the mathematics behind them. It was never explained to me the dichotomy, for instance, that probability, inherently a quantity that deals in future events is used constantly to describe statistics which is a study of the past. This contradiction is what Jaynes’ first tackles in chapter one. Our use of language can obscure the thought patterns required to be a serious scientist.

Basing probability theory off of logic and information theory is Jaynes’ crucial insight. If I’m Nate Silver and I’m trying to predict the next election with Donald Trump having a 5% chance to win. If I’m a freqentist that means I believe in random variables. That is to say: If we were to have a thousand years of elections that fifty of those years Donald would win. Taking aside how perverted it is to think that we might simulate each year over and over again, somehow producing different results, the assertion doesn’t make logical sense.

Jaynes’ would argue, that this probability is a measure of our current reality, of all the data at hand. When you look at it through this lens our probability assertion does make logical sense. The unanswered paradoxes of the frequentists can be seen by anyone taking statistics at an undergraduate level.

I’ll leave off with a beautiful passage by Jaynes’ about his personal hero H. Jefrey’s.

“In both science and art, every creative person must, at the beginning of his career, do battle with an establishment that, not comprehending the new ideas, is more intent on putting him down than understanding his message.” - E.T. Jaynes’

EDIT: I will not edit the above contents but I will say, that in the time since I read this book. About a month ago. I've become much more sympathetic to R.A. Fisher and the frequentist school by reading the perspective of biologists and other statistical professionals who employ both Bayesian methods and Frequentest. Jaynes' view does not seem discredited but it is a much more complicated and nuanced issue than I first gave it credit for.

As an undergraduate in computer science, I left my statistics course with disdain. The curriculum required the course to be taught with a rigid tradition and formalism that obscured the mathematics. After reading Jaynes’ book I’m given the opinion by a professional statistician, at least in the realm of statistical physics, that there was much more going in the background that I should be pissed off about.

I was speaking with an acquaintance once, who happens to be a math major, and statistics came up. She ended up spurting out, “well statistics isn’t real math” and we both nodded in unison. I wouldn’t understand how wrong I was until I read this book. The usual role a professional is to demand for complexity, arguing that their discipline cannot be reduced to easily repeated formulae. What Jaynes’ shows, by looking back to Aristotle, Boole, LaPlace, etc is that some things are simple and ought to not be obfuscated.

While simple, this subject is not easy and I took full advantage of reading by Audrey Clayborn alongside this text. Aimed at graduate students and practitioners of applied mathematics in the other sciences like physics and biology, this book was written to take the wind out of the psychosomatic trap statisticians have created for themselves and it does that to my understanding. I lucked out by knowing enough combinatorics to make the algebra manageable and enough differential equations/calculus to understand the proofs. But to anyone like myself that may be an undergraduate I would recommend following along with Audrey if anything just to get a second perspective on what you’re reading because it is not an easy subject.

"Often times applied statistics, when pseudorandomness is involved, the question that is difficult is not 'Can you find a needle in a haystack?' It is: 'Can you find hay in a haystack?' You keep going to the haystack and finding needles and you realize it's not a trivial problem." ~ Avi Wigderson

I skipped chapter ten, eighteen and nineteen just due to my lack of time for now. In my personal opinion, teaching the thought patterns to working and dealing with data is as important as the mathematics behind them. It was never explained to me the dichotomy, for instance, that probability, inherently a quantity that deals in future events is used constantly to describe statistics which is a study of the past. This contradiction is what Jaynes’ first tackles in chapter one. Our use of language can obscure the thought patterns required to be a serious scientist.

Basing probability theory off of logic and information theory is Jaynes’ crucial insight. If I’m Nate Silver and I’m trying to predict the next election with Donald Trump having a 5% chance to win. If I’m a freqentist that means I believe in random variables. That is to say: If we were to have a thousand years of elections that fifty of those years Donald would win. Taking aside how perverted it is to think that we might simulate each year over and over again, somehow producing different results, the assertion doesn’t make logical sense.

Jaynes’ would argue, that this probability is a measure of our current reality, of all the data at hand. When you look at it through this lens our probability assertion does make logical sense. The unanswered paradoxes of the frequentists can be seen by anyone taking statistics at an undergraduate level.

I’ll leave off with a beautiful passage by Jaynes’ about his personal hero H. Jefrey’s.

“In both science and art, every creative person must, at the beginning of his career, do battle with an establishment that, not comprehending the new ideas, is more intent on putting him down than understanding his message.” - E.T. Jaynes’

EDIT: I will not edit the above contents but I will say, that in the time since I read this book. About a month ago. I've become much more sympathetic to R.A. Fisher and the frequentist school by reading the perspective of biologists and other statistical professionals who employ both Bayesian methods and Frequentest. Jaynes' view does not seem discredited but it is a much more complicated and nuanced issue than I first gave it credit for.

May 19, 2015

It's a good book - it approaches probability from the right direction and develops interesting, useful results. However, the author is often wordy and spends a bunch of time trying to convince the reader why the Bayesian interpretation of statistics is superior to frequentist interpretations.. why would I be reading a book about Bayesian statistics if I thought it was a waste of time, and why do I need to read about application of these ideas to determining whether ESP is real or not? Anyway, still good book, just dinged for (1) sometimes wild tangents and (2) sometimes lengthy derivations whose final formulation does not reveal much interesting insight into the nature of the problem.

October 29, 2018

This book took my brain apart and rebuilt it.

January 28, 2022

For all its flaws, this is absolutely outstanding as a piece of mathematical exposition. Admittedly it's a little verbose in places, and the author sometimes comes across as arrogant in his critique of other academics - those who adhere to frequentist rather than Bayesian interpretations of probability, or those who fall into the trap of mind projection fallacy, to cite two examples. However, the quality of exposition more than makes up for this stylistic choice. The book deepened my understanding of the connections between probability theory and logic.

July 27, 2021

What a book!

The most difficult book I've ever read, ahead of*Reasons and Persons* by a decent margin. But fascinating and thought provoking.

After about chapter 2 I gave up on being able to follow the details of the proofs. Jaynes uses combinatorics, real analysis, measure theory, Laplace transformations, Fourier transformations, linear algebra, group theory, thermodynamics, and just about any branch of mathematics that suits his purposes. To understand everything I would need to spend at least a year laboriously going through each derivation on a blackboard. To just absorb the commentary and "style" of the proofs has already taken me almost three months, 15-90 minutes per day.

I think I'd like to come back and re-read Jaynes after I've got more background in probability theory and statistics.

But for now... God it feels good to be finished with Jaynes.

***********************************************************

What follows are miscillaneous notes I took while reading.

Further reading:

As well as the Aristotelian syllogism "A->B, A ∴ B", there's also the weak syllogism "A->B,B ∴ A becomes more plausible". We can get this with Bayes: p(A|B)=p(A)p(B|A)/p(B) with p(B|A)=1 means that p(A|B) decreases monotonically as p(B) increases. So p(A|B) ≥ p(A).

Similarly, "A->B, ~B ∴ ~A" has a weak syllogism "A->B,~A ∴ B becomes less plausible". Bayes says p(B|~A)=p(B)p(~A|B)/p(~A). The previous weak syllogism showed that p(A|B) ≥ p(A) when A->B, so we also have that p(~A|B) ≤ p(~A). Thus p(B|~A) ≤ p(B).

The original "strong" syllogism "A->B, A ∴ B" corresponds to the classic Bayes formula: p(B|A) = p(B) p(A|B) / p(A).

If you're computing the probability of drawing a red ball from an urn, then you use the hypergeometric distribution with the number of red and non-red drawns so far as parameters. If I'm understanding Jaynes correctly, then if you also knew whether the draw*after* your current draw was red or non-red, you would update your probability of the current draw in exactly the same way as if the future draw was a past draw. Remarkable!

Apropos of little:

Jaynes is... kind of a jerk

Should "Bayesian statistics" really be called Laplacian statistics? Or maybe Bernoullian statistics?

Page 113 has the derivation for a normal approximation of a beta distribution.

Page 121 has a useful Stirling approximation of the binomial distribution.

The following page mentions that the chi-square test relies on the exponential approximation of the binomial, which itself is a MacLauren approximation around the mode. Thus it is not accurate far from the mode, and can be abused to under-estimate the likelihood of the null hypothesis.

Perhaps this is the fundamental difference between Bayesians and Frequentists?

Laplace's rule of succession also applies when for predicting the next draw from an Urn. See p155.

Huh. I never noticed that.

A reason to prefer absolute-error minimisation over square-error minimisation is that only the former is reparameterization invariant.

This is quite a nice statement of the difference between Bayesians and frequentists:

Original origin of VNM utility?

Whence the arguments between Jeffreys (Bayesian) and Fisher (frequentist)?

Was Fischer evil?

"In economics, belief in business cycles goes in and out of style cyclically."

"From the Fisherian (sometimes called the piscatorial) camp"

The most difficult book I've ever read, ahead of

After about chapter 2 I gave up on being able to follow the details of the proofs. Jaynes uses combinatorics, real analysis, measure theory, Laplace transformations, Fourier transformations, linear algebra, group theory, thermodynamics, and just about any branch of mathematics that suits his purposes. To understand everything I would need to spend at least a year laboriously going through each derivation on a blackboard. To just absorb the commentary and "style" of the proofs has already taken me almost three months, 15-90 minutes per day.

I think I'd like to come back and re-read Jaynes after I've got more background in probability theory and statistics.

But for now... God it feels good to be finished with Jaynes.

***********************************************************

What follows are miscillaneous notes I took while reading.

Further reading:

My interest in probability theory was stimulated first by reading the work of Harold Jeffreys (1939) and realizing that his viewpoint makes all the problems of theoretical physics appear in a very different light. But then, in quick succession, discovery of the work of R. T. Cox (1946), Shannon (1948) and Po ́lya (1954) opened up new worlds of thought, whose explo- ration has occupied my mind for some 40 years.

In summary, qualitative correspondence with common sense requires that w(x) [...] must range from zero for impossibility up to one for certainty [or] range from ∞ for impossibility down to one for certainty. [...] Given any function w1(x) which is acceptable by the above criteria and represents impossibility by ∞, we can define a new function w2(x) ≡ 1/w1(x), which will be equally acceptable and represents impossibility by zero. Therefore, there will be no loss of generality if we now adopt the choice 0 ≤ w(x) ≤ 1 as a convention; that is, as far as content is concerned, all possibilities consistent with our desiderata are included in this form. (As the reader may check, we could just as well have chosen the opposite convention; and the entire development of the theory from this point on, including all its applications, would go through equally well, with equations of a less familiar form but exactly the same content.)

Aristotelian deductive logic is the limiting form of our rules for plausible reasoning, as the robot becomes more and more certain of its conclusions.

As well as the Aristotelian syllogism "A->B, A ∴ B", there's also the weak syllogism "A->B,B ∴ A becomes more plausible". We can get this with Bayes: p(A|B)=p(A)p(B|A)/p(B) with p(B|A)=1 means that p(A|B) decreases monotonically as p(B) increases. So p(A|B) ≥ p(A).

Similarly, "A->B, ~B ∴ ~A" has a weak syllogism "A->B,~A ∴ B becomes less plausible". Bayes says p(B|~A)=p(B)p(~A|B)/p(~A). The previous weak syllogism showed that p(A|B) ≥ p(A) when A->B, so we also have that p(~A|B) ≤ p(~A). Thus p(B|~A) ≤ p(B).

The original "strong" syllogism "A->B, A ∴ B" corresponds to the classic Bayes formula: p(B|A) = p(B) p(A|B) / p(A).

These points must represent some ultimate ‘elementary’ propositions ω into which A can be resolved. (A physicist refuses to call them ‘atomic’ propositions, for obvious reasons.)

If you're computing the probability of drawing a red ball from an urn, then you use the hypergeometric distribution with the number of red and non-red drawns so far as parameters. If I'm understanding Jaynes correctly, then if you also knew whether the draw

Our uncertain phrasing here indicates that ‘odds’ is a grammatically slippery word. We are inclined to agree with purists who say that it is, like ‘mathematics’ and ‘physics’, a singular noun in spite of appearances. Yet the urge to follow the vernacular and treat it as plural is sometimes irresistible, and so we shall be knowingly inconsistent and use it both ways, judging what seems euphonious in each case.

Apropos of little:

For an interesting account of the life and work of Gustav Theodor Fechner (1801–87), see Stigler (1986c).

Jaynes is... kind of a jerk

The story of the schoolboy who made a mistake in his sums and concluded that the rules of arithmetic are all wrong, is not fanciful. There is a long history of workers who did seemingly obvious things in probability theory without bothering to derive them by strict application of the basic rules, obtained nonsensical results – and concluded that probability theory as logic was at fault. The greatest, most respected mathematicians and logicians have fallen into this trap momentarily, and some philosophers spend their entire lives mired in it.

As Gauss stressed long ago, any kind of singular mathematics acquires a meaning only as a limiting form of some kind of well-behaved mathematics, and it is ambiguous until we specify exactly what limiting process we propose to use.

Should "Bayesian statistics" really be called Laplacian statistics? Or maybe Bernoullian statistics?

It appears that this result was first found by an amateur mathematician, the Rev. Thomas Bayes (1763). For this reason, the kind of calculations we are doing are called ‘Bayesian’. We shall follow this long-established custom, although it is misleading in several respects. The general result (4.3) is always called ‘Bayes’ theorem’, although Bayes never wrote it; and it is really nothing but the product rule of probability theory which had been recognized by others, such as James Bernoulli and A. de Moivre (1718), long before the work of Bayes. Furthermore, it was not Bayes but Laplace (1774) who first saw the result in generality and showed how to use it in real problems of inference.

Page 113 has the derivation for a normal approximation of a beta distribution.

Page 121 has a useful Stirling approximation of the binomial distribution.

The following page mentions that the chi-square test relies on the exponential approximation of the binomial, which itself is a MacLauren approximation around the mode. Thus it is not accurate far from the mode, and can be abused to under-estimate the likelihood of the null hypothesis.

Perhaps this is the fundamental difference between Bayesians and Frequentists?

A practical difficulty of this was pointed out by Jeffreys (1939); there is not the slightest use in rejecting any hypothesis H0 unless we can do it in favor of some definite alternative H1 which better fits the facts.

amiliar problems of everyday life may be more complicated than scientific problems, where we are often reasoning about carefully controlled situations. The most familiar problems may be so complicated – just because the result depends on so many unknown and uncontrolled factors – that a full Bayesian analysis, although correct in principle, is out of the question in practice.

In recent years there has grown up a considerable literature on Bayesian jurisprudence; for a review with many references, see Vignaux and Robertson (1996).

relativity theory, in showing us the limits of validity of Newtonian mechanics, also confirmed its accuracy within those limits; so it should increase our confidence in Newtonian theory when applied within its proper domain (velocities small compared with that of light). Likewise, the first law of thermodynamics, in showing us the limits of validity of the caloric theory, also confirmed the accuracy of the caloric theory within its proper domain (processes where heat flows but no work is done).

Laplace's rule of succession also applies when for predicting the next draw from an Urn. See p155.

A common and useful custom is to use Greek letters to denote continuously variable parameters, Latin letters for discrete indices or data values

Huh. I never noticed that.

A reason to prefer absolute-error minimisation over square-error minimisation is that only the former is reparameterization invariant.

This is quite a nice statement of the difference between Bayesians and frequentists:

"When I toss a coin, the probability for heads is one-half.’ What do we mean by this statement? Over the past two centuries, millions of words have been written about this simple question. [...] By and large, the issue is between the following two interpretations:

(A) ‘The available information gives me no reason to expect heads rather than tails, or vice versa – I am completely unable to predict which it will be.’

(B) ‘If I toss the coin a very large number of times, in the long run heads will occur about half the time – in other words, the frequency of heads will approach 1/2.’"

Insurance premiums are always set high enough to guarantee the insurance company a positive expectation of profit over all the contingencies covered in the contract, and every dollar the company earns is a dollar spent by a customer. Then why should anyone ever want to buy insurance?

The point is that the individual customer has a utility function for money that may be strongly curved over ranges of $1000; but the insurance company is so much larger that its utility for money is accurately linear over ranges of millions of dollars.

[...]

We leave it as an exercise for the reader to show from (13.5) that a poor man should buy insurance, but a rich man should not unless his assessment of expected loss is much greater than the insurance company’s.

Original origin of VNM utility?

This is not to say that the problem has not been discussed; de Groot (1970) notes the very weak abstract conditions (transitivity of preferences, etc.) sufficient to guarantee existence of a utility function. Long ago, L. J. Savage considered construction of utility functions by introspection. This is described by Chernoff and Moses (1959): suppose there are two possible rewards r1 and r2; then for what reward r3 would you be indifferent between (r3 for sure) or (either r1 or r2 as decided by the flip of a coin)? Presumably, r3 is somewhere between r1 and r2. If one makes enough such intuitive judgments and manages to correct all intransitivities, a crude utility function emerges. Berger (1985, Chap. 2) gives a scenario in which this happens.

If we wish to consider an improper prior, the only correct way of doing it is to approach it as a well-defined limit of a sequence of proper priors.

Whence the arguments between Jeffreys (Bayesian) and Fisher (frequentist)?

Firstly, we need to recognize that a large part of their differences arose from the fact that Fisher and Jeffreys were occupied with very different problems. Fisher studied biological problems, where one had no prior information and no guiding theory (this was long before the days of the DNA helix), and the data taking was very much like drawing from Bernoulli’s urn. Jeffreys studied problems of geophysics, where one had a great deal of cogent prior information and a highly developed guiding theory (all of Newtonian mechanics giving the theory of elasticity and seismic wave propagation, plus the principles of physical chemistry and thermodynamics), and the data taking procedure had no resemblance to drawing from an urn. Fisher, in his cookbook (1925, Sect. 1) defines statistics as the study of populations; Jeffreys devotes virtually all of his analysis to problems of inference where there is no population.

Was Fischer evil?

In any field, the most reliable and instantly recognizable sign of a fanatic is a lack of any sense of humor. Colleagues have reported their experiences at meetings, where Fisher could fly into a trembling rage over some harmless remark that others would only smile at. Even his disciples (for example, Kendall, 1963) noted that the character defects which he attributed to others were easily discernible in Fisher himself; as one put it, ‘Whenever he paints a portrait, he paints a self-portrait’.

"In economics, belief in business cycles goes in and out of style cyclically."

"From the Fisherian (sometimes called the piscatorial) camp"

July 19, 2017

Jaynes' tome on Bayesian Statistics and its underpinnings. A really important text for me while I was working on my PhD. I found a lot of really useful guidance here on assigning prior probabilities and using maximum entropy principles. It's also just fun to read. Jaynes has a strong voice and is a bold shit-talker when it comes to the short-comings of traditional frequentist statistics.

March 7, 2019

What do I need to know to be able to read this book?

June 14, 2022

Excellent book. The first 5 chapters or so are magnificent; the later ones are unpolished, probably because Jaynes died before completing the book. Also I would have liked more details in the derivations. I hope some day, a revised, undergrad version of this book will be available.

September 19, 2022

A classic in the field, a bit dated and definitely gets into excessive detail (i.e., on linear regression techniques). Overall a good blend of the philosophy of probability and the practice of science.

November 24, 2016

Entendo mellor os Youtubes

January 11, 2022

...what we believe should be independent of what we want. But the converse need not be true, because our desires (wants) can depend on what we know/believe...

Page 424: "There is another aspect in which loss functions are less firmly grounded than are prior probabilities. We consider it an important aspect of 'objectivity' in inference - almost a principle of morality - that we should not allow our opinions to be swayed by our desire; what we believe should be independent of what we want. But the converse need not be true; on introspection, we would probably agree that what we want depends very much on what we know, and we do not feel guilty of any inconsistency or irrationality on that account."

The author comments that this idea reminded him of the Quasimodo, who condemned by an accident of Nature to be something intermediate between a man and a gargoyle, wished he had been made a whole man. But after learning about the behavior of men, he wished instead that he had been made a whole gargoyle.

In other words, after gaining knowledge, he changed his wants. However, his wants did not define his knowledge (beliefs).

Page 424: "There is another aspect in which loss functions are less firmly grounded than are prior probabilities. We consider it an important aspect of 'objectivity' in inference - almost a principle of morality - that we should not allow our opinions to be swayed by our desire; what we believe should be independent of what we want. But the converse need not be true; on introspection, we would probably agree that what we want depends very much on what we know, and we do not feel guilty of any inconsistency or irrationality on that account."

The author comments that this idea reminded him of the Quasimodo, who condemned by an accident of Nature to be something intermediate between a man and a gargoyle, wished he had been made a whole man. But after learning about the behavior of men, he wished instead that he had been made a whole gargoyle.

In other words, after gaining knowledge, he changed his wants. However, his wants did not define his knowledge (beliefs).

May 19, 2020

Textbook by a staunch Bayesian describing his approach to probability/statistics from the ground up.

He's a feisty one, and spends a fair bit of his time attacking various viewpoints. It's quite fun to read the first few times, but he definitely repeats himself.

The content is nice though, I feel I finally understand what those messy things like p-values, chi squared tests etc are.

He's a feisty one, and spends a fair bit of his time attacking various viewpoints. It's quite fun to read the first few times, but he definitely repeats himself.

The content is nice though, I feel I finally understand what those messy things like p-values, chi squared tests etc are.

November 30, 2022

I genuinely think it's not an exaggeration to say that every scientist should read this book.

It may be obnoxious at times, but the clarity and utility of the approach is excellent, and will be something I continue to routinely consult.

It may be obnoxious at times, but the clarity and utility of the approach is excellent, and will be something I continue to routinely consult.

May 31, 2021

The math is a bit above my head, but I still got a few things out of this. I have a much better grasp of the Bayesian vs Frequentist approach. I really like Jaynes' approach of decoupling information from physical reality, which is neither subjective or objective. It's prior knowledge all the way down. But I don't have a dog in any of these fights.

June 18, 2009

This is a must read for anyone claiming to be a probabilist.

Review can be found here.

November 13, 2011

This book is going to be huge in the next twenty years. Just keep tuned.

December 18, 2016

Best book on statistics ever.

January 28, 2008

Good book on Bayesian, or "Jaynesian" statistics. If it were finished it would've been truly great.

This comes highly recommended by Paul Snively: http://twitter.com/psnively/status/10...

Displaying 1 - 23 of 23 reviews