Converse's Reviews > The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century
by David Salsburg
bookshelves: mathematics, non-fiction
This book is an informal history of the development of statistics in the 20th century. The author, who was a statistician in the pharmaceutical industry, had meant many of the statisticians whose work he discusses. It is not a mathematical book, in the sense that not a line of algebra is to be found in it.
The author begins with Karl Pearson in Britain at the begining of the twentieth century, when he took over Francis Galton (cousin of Charles Darwin, promoter of the use of fingerprints in identification, and eugenicist) biometrical (measuring people) laboratory in London. Pearson believed that the aim of science, in a world where the objects of study show ubiquitous numerical variation in the measurement of any trait you care to mention, should be the determination of the smaller number of parameters (numbers like the average, the standard deviation, plus some more obscure ones like symmetry and kurtosis) that describe the distribution of variability of a trait. Pearson's importance was greatly inhanced by his ediotrial control over an important scientific journal, Biometrika
In this journal, Pearson early on published the work of William Gosset, the developer of a most useful statisticial test called the t test, with which one can determine how likely it is that two different samples with different means are in fact different. Gosset had to published under a pen name because his employer, the Guiness brewery, forbade its employees to publish after an unfortunate incident when a brewer gave out some trade secrets.
The next great figure, definitely of the most important statisticians, possibly the most important, and scientists of the 20th century was R. A. Fisher. Fisher developed most of the statistical techniques, such as analysis of variance, that I used 80 years later to analyze, interpret, and design experiments. Fisher combined great mathematical insight with the writing of a book, Statistical Methods for Research Workers , that showed us lesser mortals how to go about designing and analyzing experiments in a proper manner without worrying us about how these methods worked. Unfortunately for proper mathematicians, he was less good in explaining in his other publications how he derived his methods, and in some cases I gather he never published any derivation. However his unpublished papers show he wasn't making it all up, indeed as late as the 1970s those who examined his unpublished work learned new insights. Gradually other math types filled in or found on their own the detailed proofs.
The other candidate for worlds greatest statistician would be A. N. Kolmogorov, who put probability on a firm mathematical foundation (among many other things). As this work is a necessary condition for statistical application to be valid, this is very important. He was also an important developer of statistical methods. Kolomogorov was a Soviet citizen of astonishingly wide interests (he spent a good deal of time teaching gifted children at a special Moscow school), and great kindness.
The third most important statistician would probably be Jerzy Neyman, a man of Polish antecedents who spent much of his career in the United States. Neyman, along with Egon Pearson (Karl's son) was the developer of the now ubiqutious notion of hypothesis testing. Neyman also developed the concept of a confidence interval. In hypothesis testing you determine how often an experimental outcome, called the null hypothesis, would occur do to chance alone. If this probability is below a small but arbitrary value, one concludes that the null hypothesis is rejected, and that the alternative hypothesis is correct. An example of confidence intervals is when a poll's results state that the president's approval rate is 45%, plus or minus 3%, with 95% confidence. Both of these ideas are controversial as well as very commonly used, in part because their meaning in application (the math is perfectly rigorous)becomes less clear the more you examine them.
Besides the greats, the author covers many other people (including John Tukey, who probably should be included in the list of greats), who made important contributions to statistics. He also covers other techniques than the ones I mentioned above, such as experimental design, non-parametric statistics, computer intensive methods such as the bootstrap, and Bayesian statistics.
The author also brings up the philosophical matters that continue to bedevil statistics, such as the meaning of a probability. The most common approach is the frequentist school, in which a probability is the fraction of total outcomes in which a particular outcome (say winning the lottery divided by the number of tickets you've bought) is observed. There are difficulties with this approach, as there are with alternative Bayesian approaches. I think the author is mistaken in his apparent belief that issue is primarily limited to applications of statistics; I would says that it is problem with all forms of knowledge which depend to some extent on induction, the drawing of conclusions from necessarily limited numbers of experiences.
The book includes a bibliography, including a most helpful annotated section of recommended books that are light on the math. There is also a chronology of developments in statistics.
I enjoyed the book. I found some of the discussions of statistical issues unclear, which I attribute to the author's unwillingness to introduce any mathematical notation in order to avoid scaring off readers. I think a little math at the level of simple algebra, possibly separated from the main text in separate boxes so as not to disturb the flow of the narrative, would have been helpful in this regard.