What do you think?
Rate this book
753 pages, Hardcover
First published April 9, 1999
My interest in probability theory was stimulated first by reading the work of Harold Jeffreys (1939) and realizing that his viewpoint makes all the problems of theoretical physics appear in a very different light. But then, in quick succession, discovery of the work of R. T. Cox (1946), Shannon (1948) and Po ́lya (1954) opened up new worlds of thought, whose explo- ration has occupied my mind for some 40 years.
In summary, qualitative correspondence with common sense requires that w(x) [...] must range from zero for impossibility up to one for certainty [or] range from ∞ for impossibility down to one for certainty. [...] Given any function w1(x) which is acceptable by the above criteria and represents impossibility by ∞, we can define a new function w2(x) ≡ 1/w1(x), which will be equally acceptable and represents impossibility by zero. Therefore, there will be no loss of generality if we now adopt the choice 0 ≤ w(x) ≤ 1 as a convention; that is, as far as content is concerned, all possibilities consistent with our desiderata are included in this form. (As the reader may check, we could just as well have chosen the opposite convention; and the entire development of the theory from this point on, including all its applications, would go through equally well, with equations of a less familiar form but exactly the same content.)
Aristotelian deductive logic is the limiting form of our rules for plausible reasoning, as the robot becomes more and more certain of its conclusions.
These points must represent some ultimate ‘elementary’ propositions ω into which A can be resolved. (A physicist refuses to call them ‘atomic’ propositions, for obvious reasons.)
Our uncertain phrasing here indicates that ‘odds’ is a grammatically slippery word. We are inclined to agree with purists who say that it is, like ‘mathematics’ and ‘physics’, a singular noun in spite of appearances. Yet the urge to follow the vernacular and treat it as plural is sometimes irresistible, and so we shall be knowingly inconsistent and use it both ways, judging what seems euphonious in each case.
For an interesting account of the life and work of Gustav Theodor Fechner (1801–87), see Stigler (1986c).
The story of the schoolboy who made a mistake in his sums and concluded that the rules of arithmetic are all wrong, is not fanciful. There is a long history of workers who did seemingly obvious things in probability theory without bothering to derive them by strict application of the basic rules, obtained nonsensical results – and concluded that probability theory as logic was at fault. The greatest, most respected mathematicians and logicians have fallen into this trap momentarily, and some philosophers spend their entire lives mired in it.
As Gauss stressed long ago, any kind of singular mathematics acquires a meaning only as a limiting form of some kind of well-behaved mathematics, and it is ambiguous until we specify exactly what limiting process we propose to use.
It appears that this result was first found by an amateur mathematician, the Rev. Thomas Bayes (1763). For this reason, the kind of calculations we are doing are called ‘Bayesian’. We shall follow this long-established custom, although it is misleading in several respects. The general result (4.3) is always called ‘Bayes’ theorem’, although Bayes never wrote it; and it is really nothing but the product rule of probability theory which had been recognized by others, such as James Bernoulli and A. de Moivre (1718), long before the work of Bayes. Furthermore, it was not Bayes but Laplace (1774) who first saw the result in generality and showed how to use it in real problems of inference.
A practical difficulty of this was pointed out by Jeffreys (1939); there is not the slightest use in rejecting any hypothesis H0 unless we can do it in favor of some definite alternative H1 which better fits the facts.
amiliar problems of everyday life may be more complicated than scientific problems, where we are often reasoning about carefully controlled situations. The most familiar problems may be so complicated – just because the result depends on so many unknown and uncontrolled factors – that a full Bayesian analysis, although correct in principle, is out of the question in practice.
In recent years there has grown up a considerable literature on Bayesian jurisprudence; for a review with many references, see Vignaux and Robertson (1996).
relativity theory, in showing us the limits of validity of Newtonian mechanics, also confirmed its accuracy within those limits; so it should increase our confidence in Newtonian theory when applied within its proper domain (velocities small compared with that of light). Likewise, the first law of thermodynamics, in showing us the limits of validity of the caloric theory, also confirmed the accuracy of the caloric theory within its proper domain (processes where heat flows but no work is done).
A common and useful custom is to use Greek letters to denote continuously variable parameters, Latin letters for discrete indices or data values
"When I toss a coin, the probability for heads is one-half.’ What do we mean by this statement? Over the past two centuries, millions of words have been written about this simple question. [...] By and large, the issue is between the following two interpretations:
(A) ‘The available information gives me no reason to expect heads rather than tails, or vice versa – I am completely unable to predict which it will be.’
(B) ‘If I toss the coin a very large number of times, in the long run heads will occur about half the time – in other words, the frequency of heads will approach 1/2.’"
Insurance premiums are always set high enough to guarantee the insurance company a positive expectation of profit over all the contingencies covered in the contract, and every dollar the company earns is a dollar spent by a customer. Then why should anyone ever want to buy insurance?
The point is that the individual customer has a utility function for money that may be strongly curved over ranges of $1000; but the insurance company is so much larger that its utility for money is accurately linear over ranges of millions of dollars.
We leave it as an exercise for the reader to show from (13.5) that a poor man should buy insurance, but a rich man should not unless his assessment of expected loss is much greater than the insurance company’s.
This is not to say that the problem has not been discussed; de Groot (1970) notes the very weak abstract conditions (transitivity of preferences, etc.) sufficient to guarantee existence of a utility function. Long ago, L. J. Savage considered construction of utility functions by introspection. This is described by Chernoff and Moses (1959): suppose there are two possible rewards r1 and r2; then for what reward r3 would you be indifferent between (r3 for sure) or (either r1 or r2 as decided by the flip of a coin)? Presumably, r3 is somewhere between r1 and r2. If one makes enough such intuitive judgments and manages to correct all intransitivities, a crude utility function emerges. Berger (1985, Chap. 2) gives a scenario in which this happens.
If we wish to consider an improper prior, the only correct way of doing it is to approach it as a well-defined limit of a sequence of proper priors.
Firstly, we need to recognize that a large part of their differences arose from the fact that Fisher and Jeffreys were occupied with very different problems. Fisher studied biological problems, where one had no prior information and no guiding theory (this was long before the days of the DNA helix), and the data taking was very much like drawing from Bernoulli’s urn. Jeffreys studied problems of geophysics, where one had a great deal of cogent prior information and a highly developed guiding theory (all of Newtonian mechanics giving the theory of elasticity and seismic wave propagation, plus the principles of physical chemistry and thermodynamics), and the data taking procedure had no resemblance to drawing from an urn. Fisher, in his cookbook (1925, Sect. 1) defines statistics as the study of populations; Jeffreys devotes virtually all of his analysis to problems of inference where there is no population.
In any field, the most reliable and instantly recognizable sign of a fanatic is a lack of any sense of humor. Colleagues have reported their experiences at meetings, where Fisher could fly into a trembling rage over some harmless remark that others would only smile at. Even his disciples (for example, Kendall, 1963) noted that the character defects which he attributed to others were easily discernible in Fisher himself; as one put it, ‘Whenever he paints a portrait, he paints a self-portrait’.