An insightful, revealing history of the magical mathematics that transformed our world.
At a summer tea party in Cambridge, England, a guest states that tea poured into milk tastes different from milk poured into tea. Her notion is shouted down by the scientific minds of the group. But one man, Ronald Fisher, proposes to scientifically test the hypothesis. There is no better person to conduct such an experiment, for Fisher is a pioneer in the field of statistics.
The Lady Tasting Tea spotlights not only Fisher's theories but also the revolutionary ideas of dozens of men and women which affect our modern everyday lives. Writing with verve and wit, David Salsburg traces breakthroughs ranging from the rise and fall of Karl Pearson's theories to the methods of quality control that rebuilt postwar Japan's economy, including a pivotal early study on the capacity of a small beer cask at the Guinness brewing factory. Brimming with intriguing tidbits and colorful characters, The Lady Tasting Tea salutes the spirit of those who dared to look at the world in a new way.
First, my old review from reading it back in 2007 before my MS in Statistics. Below that, my newer review from re-reading it in 2013 before my PhD.
~~~2007~~~ I loved the overviews of fascinating philosophical problems surrounding the use of statistics. (For example, hypothesis testing is used pretty much everywhere but even the mathematicians who came up with it had doubts about its validity and usefulness in most situations... What does a 95% confidence interval REALLY mean, in terms of real life, when you come right down to it?)
The author also paints a great picture of how rich and varied the field of statistics is, and what interesting people have contributed to it. As someone who wishes I were a real mathematician, it's a little depressing for me to read about a genius like Kolmogorov who came up with interesting results in every field he touched ever since childhood... But it's also heartening to hear about the many non-"genius" people who, unable to find an existing technique to evaluate their data, ended up creating original, useful statistical tools that became widely used and spawned whole new fields of mathematical research.
I wish the author had used at least a few equations or graphs - it's hard to explain complicated math in natural English - but it's still a great and informative read.
~~~2013~~~ I'm getting a lot more out of it on the 2nd reading, now that I know a lot of the theory he describes in handwavy terms for lay readers.
This book is definitely a history of statistics through anecdotes: the author's stories about the time he heard Savage lecture or was a discussant for Neyman, his colleagues' memories of Fisher and Pearson, etc. It's especially fascinating if you already recognize these names from their papers and are curious about their personalities, what inspired their work and how it fit into their historical context, etc. But you won't learn the technical details here. If you want something from a more formal historian of statistics, Stigler seems to be the guy: The History of Statistics: The Measurement of Uncertainty Before 1900 and Statistics on the Table: The History of Statistical Concepts and Methods. One book I wish someone would write is the history of statistics education and stats textbooks. If Bayes hid his famous theorem, Fisher discouraged unthinking use of alpha=0.05, Neyman disavowed the hypothesis tests he created, etc. ... then how did these things come to be standard methods applied mechanically by scientists everywhere?
* p.4-5: Before Fisher's The Design of Experiments, "experiments were idiosyncratic to each scientist." Although Fisher obviously didn't invent experimentation, he put it on a consistent and rigorous footing. And researchers only published their conclusions with a small supportive subset of the data, not the whole experiment. (Gregor Mendel dropped inconvenient data from his famous pea experiments!) Even today we often see researchers hesitant to share not only their data but their code, despite the reproducible research movement. * p.7: "any useful experiment has to be one that allows for estimation of those outcomes," i.e. Fisher realized that your parameters will always be estimated imperfectly, but if you design your experiment and choose your estimator carefully, the estimates can be precise enough to be useful. * p.10: Pearson's The Grammar of Science is apparently still worth reading today. (I also recall that it inspired young Neyman.) * A brief academic family tree of early statisticians working in the UK: Francis Galton came up with correlation and regression to the mean. He then (taught? managed? influenced?) Karl Pearson, who promoted the idea that distributions are the thing of interest to science, and developed his family of skew distributions. Karl Pearson influenced many in the next generation, including Fisher, Gosset, Neyman, and Pearson's own son Egon. From there on, the field of statistics exploded. * p.17: Pearson proposed that what's "real" (or of interest to science) is not each measurement we take, but the abstract distribution they come from. But it took Fisher to make clear the distinction between a parameter's true value, and your estimator for that parameter (the formula you feed data into), and your estimate of the parameter (the actual value you got using a particular dataset). He also clarified that even if you collect a ton of data, your estimates of that distribution will still be just estimates---and that Pearson's estimators do not have good properties (the resulting estimates won't tend to be as close to the truth as they could be), but better estimators can be derived. * p.19: Galton and K. Pearson's Biometrika was originally founded with the goal of measuring distributions of biological measurements about species around the world, showing that those distributions change over time, and thus providing proof of Darwin's theories. It was very expensive to print, containing complicated typesetting for math formulas and being the first journal to include full color photos. Its correspondents sound like Indiana Jones, traversing jungles and deserts and rain forests to measure native tribes and little-known animal species. It sounds like the primarily-mathematical articles (which it is now known for) were used mostly as filler at first. * p.28-29: I love the story of Gosset, or "Student," working at Guinness and keeping his research a trade secret. Hotelling tried to meet Gosset in the 1930s and "arrangements were made to meet him secretly, with all the aspects of a spy mystery." We also often forget that Gosset did not derive the mathematical t-distribution (that was Fisher) but rather he ran an early and very laborious Monte Carlo experiment, shuffling stacks of cards with numbers on them and recording the averages etc. Gosset was driven by the need to work with small samples. As he told K. Pearson: "If I am the only person that you've come across that works with too small samples, you are very singular." (Paul Velleman gave a talk at JSM 2011 debunking some of the myths around the Gosset story; I hope to find his slides somewhere.) * p.39: Cramer's book seems to be a key link in the history of stats textbooks: Fisher wrote his Statistical Methods for Research Workers as a practical manual, without proofs. Then Cramer wrote Mathematical Methods of Statistics to fill in gaps and write entire proofs as needed. "Cramer's book was used to teach a generation of new mathematicians and statisticians, and his redaction of Fisher became the standard paradigm." This is not unlike the quote from economist Paul Samuelson: "Let those who will write the nation's laws if I can write its textbooks." * p.49: Maybe I find the "degrees of freedom" concept confusing because it "was Fisher's discovery and was directly related to his geometric insights and his ability to cast the mathematical problems in terms of multidimensional geometry"---sadly not one of my strongest areas. * p.51: Fisher gave a 1947 series of talks about science on the BBC. I would love to find recordings but googling does not help, although some transcripts might be in The Listener magazine if I can find a library with access to this in its database. * p.59: Gumbel's Statistics of Extremes "is a magnificently lucid presentation of a difficult subject, filled with references to the development of the subject. The first chapter ... alone is an excellent introduction to the mathematicals of statistical distribution theory. ... Although I first read the book after I had received my Ph.D. in mathematical statistics, I learned a great deal from that first chapter." * p.66: "Since the statistic is random, it makes no sense to talk about how accurate a single value of it is. ... What is needed is a criterion that depends upon the probability distribution of the statistic" and Fisher was the one who first proposed a few such criteria (Consistency, Unbiasedness, Efficiency). * p.70-71: "In the late 1960s, I had a programmable desk calculator. ... One afternoon, I programmed the machine, checked the first few steps to make sure I had not made an error in my program, turned off the light in my office, and left for home. Meanwhile, the programmable calculator was adding and subtracting, multiplying and dividing, silently, mumbling away in its electronic innards. Every once in a while it was programmed to print out a result. The printer on the machine was a noisy impact device that made a loud sound like "BRRRAAAK." The nighttime cleaning crew came into the building and one of the men took his broom and wastepaper collector into my office. There in the darkness, he could hear a humming. He could see the blue light of the calculator's one eye waxing and waning as it added and subtracted over and over again. Suddenly, the machine awoke. "BRRAAK," it said, and then, "BRRAAK, BRRAAK, BRRAAK, BRRRRAAAAK!" He told me later that it was a terrifying experience and asked that I leave some sort of sign up the next time, warning that the computer was at work." This delightful story reminds me of a similar anecdote about Robert Groves, former Director of the US Census Bureau. * p.75: "The reader may recall those terrible moments in high school algebra when the book shifted into word problems. Mr. A and Mr. B were set rowing in still water or against a steady current, or maybe they were mixing water with oil, or bouncing a ball back and forth. Whatever it was, the word problem would propose some numbers and then ask a question, and the poor student had to put those words into a formula and solve for x. The reader may recall going back through the pages of the textbook, desperately seeking a similar problem that was worked out as an example and trying to stuff the new numbers into the formulas that were used in that example. In high school algebra, someone had already worked out the formulas. The teacher knew them or could find them in the teacher's manual for the textbook. Imagine a word problem where nobody knows how to turn it into a formula, where some of the information is redundant and should not be used, where crucial information is often missing, and where there is no similar example worked out earlier in the textbook. This is what happens when one tries to apply statistical models to real-life problems." * p.84: "The central limit theorem states that this distribution can be approximated by the normal probability distribution regardless of where the initial data came from." Well, not quite! There are very important constraints on the original data that must be met before you can apply a CLT. For example, the mean of iid Cauchy random variables is another Cauchy, not approximately Normal. See some other CLT counterexamples in Bagui, Bhaumik, and Mehra (2013). * p.95-96: Nice example of how statistics differs from another mathematical approach, chaos theory, which can also be used to describe the world and make predictions(?), but (unlike statistics) has no measure of how well the model fits reality. * p.98: The word "significant" used to mean "that the computation signified or showed something"---not necessarily something very important! Sadly, a shift in the English language changed the general usage of this word, making it a confusing term for students and users of statistics. * p.99: Fisher's succinct explanation of significance, from 1929: "An observation is judged significant, if it would rarely have been produced, in the absence of a real cause of the kind we are seeking." And from the same paper: "The test of significance only tells him what to ignore, namely all experiments in which significant results are not obtained." In other words, a nonsignificant result doesn't mean there is no effect, just that the effect wasn't measured well enough in this experiment. And a single significant result doesn't mean you've proven the effect exists---you must be able to "design an experiment so that it will rarely fail to give a significant result." * p.100: Salsburg's summary of Fisher's guidelines: "If the p-value is very small (usually less than .01), he declares than an effect has been shown. If the p-value is large (usually greater than .20), he declares that, if there is an effect, it is so small that no experiment of this size will be able to detect it. If the p-value lies in between, he discusses how the next experiment should be designed to get a better idea of the effect." I love this advice: if it's not significant, then you design a better experiment, not assume that the effect doesn't exist! Sadly this is not the way p-values are used in much of science nowadays. * p.102: I'd love to read the letters between Neyman and Egon Pearson, but they don't seem to be collected and published as far as I can tell. * p.108: Again from Fisher: "tests of significance, when used accurately, are capable of rejecting or invalidating hypotheses, in so far as they are contradicted by the data: but ... they are never capable of establishing them as true" * p.112: I know of Keynes as an economist, but didn't realize he also studied probability and wrote A Treatise on Probability - Unabridged which "demolishes [the frequentist definition of probability] as a useful or even meaningful interpretation, showing that it has fundamental inconsistencies that make it impossible to apply the frequentist definition in most cases where probability is invoked." * p.113-115: Unfortunately, Neyman found frequentism the easiest way to build a mathematically tractable & consistent theory of hypothesis testing, and that's the version that became entrenched in textbooks everywhere, even though "as early as 1935 ... he raised serious doubts" and "Neyman seldom made use of hypothesis tests directly." It seems that hypothesis tests became popularized through Wald's work on decision theory and through Lehmann's textbook Testing Statistical Hypotheses. * p.115-116: I greatly admire Neyman and am proud to share his first name, but all this about how nice he was is a bit of a hagiography. He was a pretty nice guy but could be a jerk too and had some serious troubles at home, estranged from his wife and distant from his son. His biography Neyman is very good. * p.118: Nice explanation of how interval estimates help us decide whether the estimate is precise enough (i.e. the resulting policy decisions be the same whether the truth is near the lower or higher bound) or whether we need more data and better precision (i.e. the right decision would differ based on whether the lower or upper bound is true). * p.123: "Fisher never got far with his fiducial distributions" and I thought this was an abandoned dead-end after Fisher died, but it turns out people still study fiducial inference, including CMU's own Jessi Cisewski. * p.142: "Godel once said that the gist of human genius is the longevity of one's youth." * p.143: I used to wonder why we bother studying measure theory and foundations of probability---it's just proving things that seem obvious, right?---but before Kolmogorov put this all in order, all these "obvious" results were very much ad hoc, instead of being rigorously tied together. Likewise with Lebesgue's work on the foundations of calculus. Although the links seem obvious to us know, it was very different before Lebesgue and Kolmogorov, and I can't imagine what a change it must have been to read their work for the first time (without having already been exposed to the fruits of their labor like we have today). * p.146-147: Kolmogorov tried tackling the real-life interpretation of probability, in a very different way from his work on measure theoretical foundations, but apparently did not complete this project before his death and sadly nobody has been able to figure out where he was going with it. * p.148-150: Statistics can be highly political! It seems laughable today to think that Soviet planners would dismiss statistical work because "random variable" translates as "accidental magnitude" and the central planners felt insulted that their work could be considered accidental... But this lack of proper experimentation and evidence-based decisions led to massive starvation and economic weakness. Even apart from such extremes, governments have always tightly controlled the release of national statistics. Soviet statisticians were being threatened during the Cold War, and even today there are reports of Argentina repressing its statisticians for publishing damning inflation estimates.
I think this should be required reading for every young statistician. All the other majors seem to have some sort of History of [insert program name here], but I don't remember one from when I was working on my major (in statistics). I felt this book was exactly what it claimed to be--a description of how statistics revolutionized science in the 20th century. Some people seem to think that this book is supposed to describe statistical methods like an introductory textbook. If you want that, you should go read an introductory statistics textbook. It's not like there aren't plenty of those out there. This is a historical/philosophical look at how statistics has influenced science and vice versa.
The book is organized according to topics in statistics including biographical sketches of the people important to the development and application of each of the topics. This can make it a little difficult to keep everything in perspective for how it fits in the timeline. But I don't think there would have been a better way to organize it anyway.
I loved reading this book and found it entertaining, witty, and enlightening.
الاحصاء ♥️ واحد من الفروع المفضلة عندي في الرياضة اللى هو العلم المفضل بالنسبالي .. *الرياضة علم فلسفي بحت بما إن كل حاجة فى الكون نسبية و ليها اكتر من وجهة نظر فالإحصاء هو العلم الأهم لأن مفيش حاجة نقدر نحطها في مكانها من غير ما نقدّر النسبة الصحيحة، نسبة المادة الفعالة فى الدواء، نسب المرضى في الحروب فى الأوبئة، نسب الموتى، القياسات الهندسية و المباني حتى فى الديكور و فى أبسط الأشياء... الكتاب عنوانه عن قصة حقيقية عن ست بتوضح الفرق بين الشاي باللبن و اللبن بالشاي " إنها التفاصيل " 😅 الطريف إن القصة دي كانت سبب تطور علم الإحصاء فى العصر الحديث .. الكتاب بيدور حول تطور العلم و النظريات و ظروف ظهورها .. ظروف العلماء و الاضطهاد السياسي أثناء الحروب العالمية و شكوك السوڤيت فى الأمريكان و العكس و تأثير دا فى تعاملهم مع العلماء و تأثير دا على ظروف معيشتهم و الصراع بين النظريات و العلماء نفسهم ... بدون أفكار معقدة أو نظريات أو أرقام
I really wanted to like this book. I love science history books, and while I am not a technical person, I appreciate the "Physics for Poets" level description that are a feature of many science history books. My problem with this book, and ultimately why I gave up is precisely due to the author's inability to handle the technical details. He says that he wife reminded him not to be too detailed, and ultimately he wasn't detailed enough. He described major changes in statistics, but it was hard to tell what those changes were. And frankly, many of the people described were not that interesting. They were statisticians after all.
A leitura é tão interessante quanto a do livro "o andar do bêbado", contudo, uma senhora que toma chá é um pouco mais acadêmico. Saber como e em qual situação surgiram as distribuições t de student, o f de fisher, as distribuições não paramétricas, além de conhecer a importância de diversos estatísticos matemáticos (homens e mulheres) para o desenvolvimento da ciência ao longo dos anos com uma leitura bastante leve e didática.
This was so interesting! It was so cool to hear about the actual people behind all of the names my stats training taught me - Pearson, Fisher, Tukey, Box, Cox, and more. It also served to show how young this field of statistics is in some ways, but how classic it is in others.
This book does suffer from the law of misonomy that Salisbury mentions often - “the lady tasting tea” is in about three lines. I’m not sure what the reasoning was there, and it threw me for a bit early on.
The author claims this is for non-technical, non-mathematical people, and he doesn’t include any formulas or proofs for that reason.
That being said, I think statisticians, particularly of my own generation, are the best audience here, as I found the development of my chosen field super interesting.
I love what David Salsburg attempts to do here: explain the basic concepts of statistics by guiding the reader through the history of its development as a discipline. Too often we learn concepts and methods that are popular today without understanding why we use them or how they developed. But however much I appreciate Salsburg's approach, I cannot recommend his book. It is inconsistently paced, lacking in any real explanations of the statistics, and peppered with "when I met [so-and-so famous person]" and "when I invented this statistical term with [so-and-so famous person]" name-dropping.
The first chapters offer a mangled, difficult-to-follow history of the genesis of statistics. Salsburg introduces some basics of statistics, such as regression to the mean and skew distributions, but he wedges them into the narrative as afterthoughts. He literally spends a mere one to two sentences to explain a concept. I understand that this is not a statistics textbook, but a breakthrough new idea has no meaning to me unless I halfway understand what the idea is. Needless to say, there was a dissonance between Salsburg's excitement and my dull incomprehension of what was so exciting.
In later chapters, the book becomes more of a biography per chapter, which was easier for me to take in but not how I would have chosen to organize a book on the development of statistics. My overall impression is that Salsburg made an outline of thoughts he jotted down, rearranged a few of the points, then fleshed out his half-baked outline into a book. The result is a book that isn't explanatory enough for a beginner and isn't detailed enough for an expert.
It's a book about statistics... but it doesn't actually talk about how to do stats. Instead, it's about the evolution of the practice of statistics told by someone who was in the front lines of its evolultion. Each chapter is dedicated to a person or development so that we see the field evolve over time. It's really a fantastic meditation on what we can do -- and should do -- with stats and what we can't.
My favorite part relates to the lowly p-vale. Where on earth did this thing come from? Salsburg gives us a hint (p.99) -- "The closest [Fisher] came to defining a specific p-value that would be significant in all circumstances occurred in an article printed in the Proceedings of the Society for Psychical Research in 1929." He states in this article: "It is a common practice to judge a result significant, if it is of such a magnitude that it would been produced by chance not more frequently than once in twenty trials. This is an arbitrary, but convenient, level of significance for the practical investigator, but it does not mean that he allows himself to be deceived once in every twenty experiments. The test of significance only tells him what to ignore, namely all experiments in which significant results are not obtained. He should only claim that a phenomenon is experimentally demonstrable when he knows how to design an experiment so that it will rarely fail to give a significant result."
الإحصاء وما أدراك ما الإحصاء كل شيء إحصاء تلك الإعلانات التي تظهر لك على مواقع التواصل الإجتماعي وكمية المواد الداخلة في تركيب الأشياء حولنا تلك الأبحاث التي تقرأها المعلومات المتناثرة في كل حدب وصوب هنا كلها إحصاء وأكثر وأكثر وأكثر....... إن لم تكن مدرك أهميتها فعليك قراءة هذا الكتاب هل تعلم أن هناك فرق في الطعم إذا وضعت الشاي أولًا ثم اللبن عن العكس؟....
يعتبر هذا الكتاب سرد تاريخي لعلم الإحصاء وليس معلومات رياضية مكتشف القانون وتاريخه والنقد الذي قدم ليها ثم التعديلات التي توالت عليه وهكذا.
The first three chapters were the best. He started out with some really good stuff that was both biographically interesting and statistically informative. But is seemed like he lost steam. That said, there were still some good chapters and interesting anecdotes, and I generally enjoyed the book. I had to read about two-thirds for a class, and I finished the rest of it after the class was over, so that says something.
5-6 different partially written books combined into a single manuscript that turns out to be a mostly shallow biographical survey of early statisticians. there are some gems in here about the milieu of early statistics, but doesn't really deliver anything more substantive than an interesting footnote or two
It was very interesting to read about the people behind the known statistical methods! Also, the author has a nice writing style, it does not feel dry. Sometimes, he even builds up the expectation for the next chapter so I just wanted to know what happened...
كتاب دسم، لا يمكن قراءته مرة واحدة ولا يمكن قراءته بمفرده دون شرح أكثر للمصطلحات الواردة :D يركز الكاتب على الأفكار وتطور الإحصاء دون ذكر أي معادلات رياضية أو تعقيدات، لكن بعض الفصول لا يمكن فهمها دون مزيد من التوضيح للمصطلح أو النظرية الإحصائية التي يدور حولها الفصل. جهد واضح في الترجمة.
The most popular image of Statistics we have is from Mark Twain’s re-tweet of the quote attributed to Benjamin Disraeli, "There are three kinds of lies: lies, damned lies, and statistics.". With the advent of computers and vast amount of storage, ever more data is available for crunching by scientists. Consequently, we have ever more conclusions based on data, not all of them unbiased. Politicians, environmentalists, businesses and scientists have all been guilty of selectively choosing data to push their agendas under the garb of ‘scientific conclusions based on real statistical data’. However, if we reflect carefully, we see that our well-being itself nowadays is understood only in terms of numbers and indices given to us by the science of Statistics. Without numbers like GDP growth rates, Consumer Price Index, Inflation rates, Unemployment rates, currency exchange rates etc, life as we know today would be a stumble in darkness. So, it is important to understand the role of Statistics in the modern world, what it means to us and how we can productively use it to improve our lives. This book by David Salsburg takes us through the important ideas and developments in Statistics during the past hundred years and more. It shows us the towering figures of this discipline, their contributions, their collaborations with one another as well as their profound disagreements and how it fundamentally changed the way science itself looked at understanding Nature.
Statistics was one my subjects in the University. I used to have an understanding of it as a branch of Mathematics/Science where one collects and analyzes numerical data in large quantities. I understood its purpose to be the extraction of values, called parameters, out of this mass of data so that we can make sense of the reality that this data represents. The preface to this book, by the author himself, showed me how primitive this understanding is. He shows how Statistics moved the philosophical vision of Science away from a deterministic model of the Universe to a probabilistic model. In the nineteenth century, Science viewed the Universe as working on clockwork precision. A small number of Mathematical formulas, like Darwin’s, Newton’s and Boyle’s laws, could be used to describe reality and predict future events. What was needed were a set of such formulas and measurements with precision. But in practice, measurements lacked precision. The more the instruments were refined, the more scientists became aware of greater variations. The differences between what the models predicted and what was observed and measured grew bigger. The picture of the ‘clockwork universe’ lay in shambles. Science started moving towards a new paradigm - the statistical model of reality. Because statistical models of reality are mathematical, we can understand reality through the ideas of randomness, probability and statistics. In the twentieth century, the rise of Quantum Mechanics reinforced it substantially. I found this view of the evolution of Science in the twentieth century fresh and insightful.
The book is mainly a selective history of statistics. Giants like Ronald Fisher, Karl Pearson, William Gosset, Francis Galton, Jerzy Neyman and W.E. Deming are all extensively covered for their seminal work as well as the struggles they had to wage to get their ideas accepted and at times, rejected. We see extensive biographical information and some gossip, at times. The work of many scientists is set in the social context of their times, because their work was carried out in totalitarian and post-colonial societies. For example, in the USSR, during the 1930s, communist orthodoxy was hostile to applied statistics. It affected the work of eminent scientists like Arnold Kolmogorov, who founded the axioms of probability. Indian statistical giants like P.C. Mahalanobis and C.R. Rao found themselves in more exciting times in a newly independent India in the 1950s, collecting and sorting important demographic data on the Indian population for the benefit of planning by the Nehru administration, which believed in using Science for development. W.E. Deming’s work on Quality control was given short schrift in his native USA, but the Japanese embraced it to emerge as the premium automakers of high quality in the 1980s. There is a special chapter in the book covering the contributions of women scientists like Stella Cunliffe, Judith Goldberg and others.
The book details their advancements in various fields, which include more reliable pharmaceuticals, higher quality beer, econometrics, superior quality control in manufacturing, social policy and medical diagnostic tests. There are interesting discussions on whether there is a direct link between recidivism and the length of sentence of a prisoner. The accepted wisdom is that ‘the longer the sentence, the less the recidivism’. The author discusses Stella Cunliffe’s analysis of this question which exploded the myth of this association. The chapter ’The Man who remade Industry’ has compelling details on the great contributions of W.E. Deming on quality control and how it revolutionized the Japanese automobile industry. However, I shall just touch upon one fundamental insight the author outlines in the chapter, ‘Smoking and Cancer’, which captured my imagination.
The chapter on ‘Smoking and Cancer’ is centered on a philosophical and analytical discussion on what is ‘cause and effect’. Author Salsburg says that Prof. Bertrand Russell effectively showed in the early 1930s that there is no such valid scientific concept as ‘cause and effect’! It is a vague, common notion that does not stand up to pure reason. It contains an inconsistent set of contradictory ideas and is of little or no value in scientific discourse! If it is so, what does it mean for us in society? Did Agent Orange cause those health problems in Vietnam and after? Does smoking cause cancer? The statistics giant, Ronald Fisher, a pipe smoker himself, did not believe smoking caused cancer. He pointed out that studies showed that people who did not inhale the smoke had a higher incidence of lung cancer than those who inhaled. This is inconsistent with the conclusion. Additionally, he mused, suppose that there was something genetic that induced some people to smoke than others. Suppose this same genetic disposition involved the occurrence of lung cancer. It was well known that many cancers have a familial component. Suppose this link between smoking and cancer was due to the same genetic disposition. To prove his case, Fisher assembled data on identical twins and showed that there was a strong familial tendency for both twins to be either smokers or non-smokers. He challenged others to show that lung cancer was not similarly genetically influenced. Fisher’s objections are motivated by science. Studies of smoking use data from what is called opportunity samples, or people who were smoking already. Ideally, one must do a study that asks half the participants to start smoking two packs or more a day and make observations. This is known as a double-blind study to prevent bias. But this is ethically untenable. Though a lot of evidence exists that smoking is bad, each one of them is in some way flawed as well.
I found this analysis fascinating because we routinely accept so many ‘cause and effect’ claims by environmentalists and other social scientists without much scrutiny. Is the thinning of arctic sea ice really the cause of polar bears dying of starvation? Did DDT really cause cancer? Did the CFCs from refrigerators really cause the Ozone layer to vanish over the Antarctic?
The final chapter titled ‘The idol with feet of clay’ is a philosophical look at the future. Salsburg says that the progress of Science implies that eventually the statistical revolution also will be overthrown in favor of a better one. Science produces a model that fits available data and uses it to predict results of new experiments. But, no model is fully accurate. So, more and more data results in more and more complicated models and their exceptions. At some point, it no longer serves the purpose and new thinkers emerge to create a revolution. One can see the Einsteinian revolution as one such event. In this sense, Salsburg says that the science of statistics also stands on feet of clay and that the revolution which may overthrow it, is perhaps already germinating amongst us.
I found it an enjoyable book to read. I learnt a lot as well.
The full title here is The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century. This book by David Salsburg is pretty much what the title suggests: part history of the rise of statistical methods in scientific research and part biography about the people responsible for it. This probably isn't a book for anyone not already versed in inferential statistics and related subjects. It won't, for example, teach you much about statistics, so you'll be pretty lost or at best unimpressed by most of the stories and adulations the book contains. I would have appreciated a bit more exposition and explanation, but for those of us with a background in stats, it keeps things at a sufficiently high level so that we're not forced to pull out our old textbooks just to know what's going on.
And it's pretty interesting stuff. While Salsburg lacks (or at least holds in reserve) the panache and wit necessary to make this a really entertaining read, he does give glimpses into both the absurdity and mundanity of scientific process in this area. I was amused to learn, for example, that many august statistical techniques like analysis of variance were created so that someone could figure out how much artifical cow poop to spread over an acre of farm land. The book also tracks some of the more interesting personalities in the field, relating tales about how William Gossett created a now common and relatively simple procedure known as "Student's t-test" while working for a beer brewery (Guiness, no less) whose strict policies about sharing research forced him to publish under the (perhaps unimaginative) psudonym "Student." And then there were the cat fights and irrational, career-long grudges that these men and women slung around at each other. Though not quite on the level of say Bill Bryson's A Short History of Nearly Everything, this book does a decent job of layering those pedestrial and alltogether human eccentricities over the enormity of the scientific accomplishments they created.
So while not exactly light reading and not for the uninitiated, it's a pretty interesting read.
I saw the book as divided into the early chapters where he covers the formative history of modern statistics, focusing on Karl Pearson and Fischer, the middle chapters, in which he gives a series of biographical sketches of important contributors to statistics and finally the last chapter in which he discusses the philosophical implications and problems of statistics. I enjoyed the first and last part of the book, but I really wonder whether the short biographical sketches would interest anyone not already familiar with the statisticians involve. The chapter dedicated to Deming was of great interest to me, but that's because I already knew something of him--from a reader's perspective he deserved it. But did the Lady in Black deserve a whole chapter? Overall, the book does help the layperson understand that statistics is in fact a controversial field which skirts some philosophical topics as well. The last chapter in particular will hopefully spur readers into finding out more about the deeper problems and interpretations of probability and statistics, as it has for me. I realize the book is intended for laypersons, but I feel the author could have at least tried to make some of the ideas more concrete. For example, he mentions the forbidding topic of measure theory without the slightest effort at giving some verbal explanation of what it involves. What I had in mind is some more explanations such as the one he does in fact give of a probability space, using the probability of rain as an example, breaking down the different interpretations of this seemingly simple statement. I wish he had divided the material more clearly by the impact of statistics on science, business, politics and the military.
Very interesting book! The first book on statistics I read in Chinese (translation), and the translation is almost flawless. Totally changed my view on statistics as a whole. Should have read it much earlier. The author gives a very thorough and yet reader-friendly account of the general development of statistics in the 20th century and how its fundamental ideas and philosophy revolutionised nearly every branch of science.
The first half of the book, the more exiting part, centres on a couple of genius, including Fisher, Neymann and Kolmogorov, that laid the foundation for the field, as well as their intricate love-hate relationship.
The second half gives an account of the widespread application of statistics in all scientific fields, and here we see the constant struggle between pure theorists and applied mathematicians. Pure mathematics is slightly disparaged for its abstractness, aloofness and detachment from the real world. I hope someone could pick up from where the book ends and introduces some new ideas in the field in recent years. I strongly believe the current big data fuss is just a phase, and the next Fisher or Kolmogorov will definitely bring something more exciting and revolutionary to the scientific world.
The title refers to the story about the English lady who believed she could tell by tasting whether the milk had been added to the tea or the tea added to the milk. We find out here that apparently she could. At least in the small sample of cases recorded, she "identified every single one of the cups correctly." (p. 8)
The question--and this is the question that statisticians are forever trying to answer--is, was the result significant? Or how much faith should we put in such a result? What is the probability that such a result comes to us by chance rather than by causation? Did she simply guess right ten times in a row? Or, more saliently, how many times would she have to guess right before you'd be a believer? Or, more rigorously, how many times out of how many trials would she have to guess right before we can be confident that she isn't just guessing?
Statistics then is a way of understanding and appreciating events without reference to causation. How cigarette smoking causes lung cancer is not exactly known. The fact that cigarette smoking does indeed cause lung cancer is demonstrated by a clear statistical correlation between smoking and the instance of lung cancer. But is a statistical correlation proof?
Salsburg's very readable book is a narrative about the mathematicians who have tried to answer this and other statistical questions. The emphasis is on the mathematicians themselves, not on their mathematics. Indeed, following a time-honored "rule" in the book publishing business, a rule that insists that you lose "x" number of readers for every mathematical formula that appears on your pages, Salsburg has elected to use a grand total of zero.
I was a little disconcerted about this. To encounter Bayes's theorem or any number of other statistical ideas and see not a single formula or mathematical expression was to me like reading a joke book without any jokes in it. But for those who have heard the jokes and are only interested in the joke tellers and their problems, this is indeed a fascinating book. It is ironic that this "non-mathematical" book is probably best appreciated by those with some experience with statistics. Such readers I suspect will be quite pleased to read about the lives of such greats in statistical theory and methods as Karl Pearson, R. A. Fisher, William Sealy "Student" Gosset, John Tukey, etc. Salsburg focuses on the problems that the individual mathematicians encountered and the solutions they came up with.
Here's an example of how Salsburg does this neat trick of talking about mathematics without using any mathematics. He asks, "What is the central limit theorem?" (p. 84) and answers thusly:
"The averages of large collections of numbers have a statistical distribution. The central limit theorem states that this distribution can be approximated by the normal probability distribution regardless of where the initial data came from. The normal probability distribution is the same as Laplace's error function. It is sometimes called the Gaussian distribution. It has been described loosely in popular works as the bell-shaped curve."
Perhaps this does work for a lot of people, but I think this book would be improved if there were an appendix with a list of ideas, presented in mathematical form. For a new edition, Salsburg might want to do something like that. Then this interesting book would also be a work of reference.
My favorite method learned here is on page 236. Salsburg describes how John Tukey believes one should tally. Instead of making vertical lines and crossing every fifth one (which is what I have done for decades) Tukey recommends "a ten-mark tally. You first mark four dots to make the corners of a box. You then connect the dots with four lines, completing the box. Finally, you make two diagonal marks forming a cross within the box."
That statistical ideas are inexorably tied up with the ideas of probability is explored in the final chapter of the book, "The Idol with Feet of Clay." Salsburg observes, along with Thomas Kuhn, that we are forever describing reality with "a model...that appears to fit the data," but as the data accumulates our model "begins to require modifications." (p. 293) Reality in this sense is the postulated "universe" of the statistician, and our experiences and "laws" the result of "samplings" of that universe. Salsburg, citing L. Jonathan Cohen, goes on to recall Seymour Kyberg's "lottery paradox" which makes it clear that statistical/probabilistic "proofs" run into logical problems. He then asks if we really understand probability. He recalls the notion of "personal probability" (something I used to call "psychological probability") in which we appreciate the probability of something happening in terms of what effect it might have on us personally. Thus a small chance of getting something exceeding important to us (such as winning the lottery) might be worth paying more for the ticket than it is objectively worth. Salsburg concludes that we really do not understand probability except in the grossest sense (e.g., "50/50" or "almost certain"). Then he asks, does it matter? His answer suggests quantum mechanics in which we work with probabilities without any pretense of grasping underlying "laws."
Salsburg ends the book with a yearning for a new paradigm without feet of clay. I suspect he has in mind the undeniable and always troubling fact that the best that can ever be said about a sampling is that it has a certain probability of being an accurate reflection of the entire universe. However, my guess is that we will continue to have to be satisfied with "only" probabilistic knowledge; indeed that knowledge itself will always be subject to some degree of doubt. I might even conjecture that all real world knowledge, yearn as we might for certainty, is probabilistic.
--Dennis Littrell, author of “The World Is Not as We Think It Is”
Okay, you have to have an unusual interest in statistics to enjoy this book, and it wouldn't hurt to have taken a course or two in the subject. I learned that nearly all of statistical analysis was developed in the 20th century, with much of work done by math genius R.A. Fisher in England. Fisher published an obscure handbook of formulas in the thirties, but figuring out just what Fisher had done and making it general knowledge continued into the 60s. The great bulk of modern science depends upon statistical analysis, and I think that late development of the science of statistics has led to many problems of misapplication. If it had all been developed 150 years ago, I think much greater emphasis would now be placed on that aspect of scientific education. the story of the development, including resistance to the concepts tells a tale of how modern science works and occasionally misfires.
I thoroughly enjoyed "The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century" by David Salsburg. It was very entertaining and educational at the same time. The book recants the relatively short history of statistics highlighting many of the influential and colorful figures. I enjoyed learning about how key discoveries, such as the Student's T test, were made and under what circumstances. This book made me realize how much I take for granted the modern data-driven mindset and how relatively new of a mindset it is.
This book is good to get a broad overview of 20th century statistics. I think I learn good when I know the people behind the ideas, so this book is a very good intro to the people of statistics. It makes statistics a lot more interesting than just reading equations. Also, it gives a better idea of the problems that statistics aims to solve. Stigler's book, Statistics on the Table, is a book I read after this one. It's more detailed and has a different writing style... I like both.
I really enjoyed this book. Even though I knew nothing about stats going in, I was able to understand (albeit at a very basic level) the concepts introduced. Throughout I started having little epiphanies about how statistics influences my (and most people's) every day lives.
I also feel a little more prepared for my stats class this semester in that I have a real sense of why statistics is important, and how it makes data meaningful for decision-making.