What do

*you*think?Rate this book

Statistics has played a leading role in our scientific understanding of the world for centuries, yet we are all familiar with the way statistical claims can be sensationalised, particularly in the media. In the age of big data, as data science becomes established as a discipline, a basic grasp of statistical literacy is more important than ever.

In*How to Tell the Truth with Statistics*, David Spiegelhalter guides the reader through the essential principles we need in order to derive knowledge from data. Drawing on real world problems to introduce conceptual issues, he shows us how statistics can help us determine the luckiest passenger on the Titanic, whether serial killer Harold Shipman could have been caught earlier, and if screening for ovarian cancer is beneficial.

How many trees are there on the planet? Do busier hospitals have higher survival rates? Why do old men have big ears? Spiegelhalter reveals the answers to these and many other questions - questions that can only be addressed using statistical science.

In

How many trees are there on the planet? Do busier hospitals have higher survival rates? Why do old men have big ears? Spiegelhalter reveals the answers to these and many other questions - questions that can only be addressed using statistical science.

426 pages, Hardcover

First published March 28, 2019

Sir David Spiegelhalter has been Winton Professor of the Public Understanding of Risk at the University of Cambridge since October 2007. His background is in medical statistics, with an emphasis on Bayesian methods: his MRC team developed the BUGS software which has become the primary platform for applying modern Bayesian analysis using simulation technology. He has worked on clinical trials and drug safety and consulted and taught in a number of pharmaceutical companies, and also collaborates on developing methods for health technology assessment applicable to organisations such as NICE. His interest in performance monitoring led to his being asked to lead the statistical team in the Bristol Royal Infirmary Inquiry, and he also gave evidence to the Shipman Inquiry.

Create a free account to discover what your friends think of this book!

Displaying 1 - 30 of 386 reviews

October 7, 2019

When I am not writing witty and informative reviews on Goodreads/Amazon my day job is as a Government statistician. Therefore when offered the opportunity to read this book I thought it would be useful for me to do so. And I do believe it is helping me in my work. I am thinking more about how best to present my statistics and what analytical techniques I could use too. So this book works from that perspective.

This book takes real world questions and shows you how they've been answered introducing various statistical techniques as it does so. It does this whilst aiming to avoid "getting embroiled in technical details". The questions picked are quite interesting subjects like "why do old men have big ears?", "how many trees are there in this planet?" (an estimated 3.04 trillion if you must know) or what height will a son/daughter be given their parents' heights and so on with some of the questions being based on work the author has been involved in during his career. Relating the problems to real life helps make the text appeal not only to statisticians (to which this book is dedicated) but also to non-technical readers "who want to be more informed about the statistics they encounter both in their work and in everyday life."

Some of this is not new stuff, e.g. early bits on presentation of data such as 3D pie charts not being useful for comparing proportions. But the book does get more involved as you work through it getting deeper in statistical techniques making it harder to understand and requiring more concentration, and the author is aware of this, for example asking if it is "all clear? If it isn't then please be reassured that you have joined generations of baffled students". Also the conclusion congratulates you for getting to the end.

Useful stuff in here for me was the chapter on regression (which is what I use more commonly than much of the rest), and the last couple of chapters after the hard stuff were good reading too, showing bad examples and good examples of statistics from journals and the like and explaining why (offering learning points).

Technical stuff is relegated to the technical glossary so this book is readable (which is good for a book about statistics), although still hard in places. For my work it has been useful and I'm glad I read it and have it for future reference.

This book takes real world questions and shows you how they've been answered introducing various statistical techniques as it does so. It does this whilst aiming to avoid "getting embroiled in technical details". The questions picked are quite interesting subjects like "why do old men have big ears?", "how many trees are there in this planet?" (an estimated 3.04 trillion if you must know) or what height will a son/daughter be given their parents' heights and so on with some of the questions being based on work the author has been involved in during his career. Relating the problems to real life helps make the text appeal not only to statisticians (to which this book is dedicated) but also to non-technical readers "who want to be more informed about the statistics they encounter both in their work and in everyday life."

Some of this is not new stuff, e.g. early bits on presentation of data such as 3D pie charts not being useful for comparing proportions. But the book does get more involved as you work through it getting deeper in statistical techniques making it harder to understand and requiring more concentration, and the author is aware of this, for example asking if it is "all clear? If it isn't then please be reassured that you have joined generations of baffled students". Also the conclusion congratulates you for getting to the end.

Useful stuff in here for me was the chapter on regression (which is what I use more commonly than much of the rest), and the last couple of chapters after the hard stuff were good reading too, showing bad examples and good examples of statistics from journals and the like and explaining why (offering learning points).

Technical stuff is relegated to the technical glossary so this book is readable (which is good for a book about statistics), although still hard in places. For my work it has been useful and I'm glad I read it and have it for future reference.

June 10, 2019

Pretty good, but there are a few chapters where the author basically goes "I'm not explaining this very well, but I know you won't get it so let's just move on". I also wish there were a few more "digital" / web analytics cases, but that's just because it would help me.

Overall, an interesting and useful read.

Overall, an interesting and useful read.

August 31, 2022

Long ago I worked for a consulting firm and was assigned to look at the help desk tickets of a large government agency. They wanted to see if computer-related productivity losses could be reduced through better training, different procedures, or automated solutions to common problems.

I analyzed three years’ worth of data and saw a clear trend: this was a one-tail distribution to the right. In other words, some tickets closed within minutes (like password resets or “is the printer even turned on?”), but many took longer, especially if an Admin needed to get involved or a tech had to visit the user, and a few stayed open for weeks or months because there was no solution or the solution was more expensive than the productivity loss it caused.

With this information in hand I got on the client’s calendar. My recommendation was to divide the data into quintiles and focus on the middle three, because the first one didn’t need any help and the last one required custom, one-off solutions. He didn’t know what a quintile was and had no intention of learning. He said my job was simple: just take the average ticket closure time and move it to the left. “The average,” I said, “mean, median, or mode?” He thought I was being a smartass.

I went back to my manager and raised the alarm. No matter how much we improved the middle three quintiles, the overall average closure time was not going to be affected much if we also had to include tickets open for weeks or months. He replied with the Mantra of Mediocre Managers: “just do the best you can.”

We did good work on that contract, lowering the closure times of the second, third, and fourth quintiles by 8-14%, and saving the client thousands of productive manhours per year. The overall average, however, across the entire data set when the fifth quintile was included, was reduced by only three minutes. This was cited this as a factor when our contract was not renewed. As Kurt Vonnegut would say, “And so it goes….”

*The Art of Statistics* uses real world examples to help the reader understand how to make sense of raw data. The first thing to understand is that, if the devil is in the details, statistics has a lot of devils mucking up the work. Just defining the problem can turn out to be fiendishly difficult. The book includes a case study on mortality rates among children who underwent heart surgery at various hospitals in Britain, and simply deciding which cases to count required difficult, and subjective, decisions. What is the upper age limit to define a child? Which of the many procedures get included when deciding what to count as heart surgery? If the child dies, how do you decide if the surgery was the primary cause, a contributing factor, or a coincidental event, and how could you ever convince the parents that it was not the surgery that killed their child?

In Gina Kolata’s*Flu*, about the 1918 influenza pandemic, she cites a discussion that occurred during the Swine Flu scare of 1976

Another difficulty occurs when asking people their opinions, because how the question is asked can steer responses one way or another. We live in an age of fierce partisan politics, so it is not uncommon to see polls that deliberately attempt to skew answers, but it can also happen completely by accident if the questions are not given proper consideration. This book cites an example of this kind of framing: when asked if people would support or oppose giving 16-17 year olds the vote, the majority approved, but when the question was asked in the form of whether the voting age should be reduced, most disapproved.

There is also an excellent discussion on how we can be led astray by assumptions of accuracy. Take for instance a 95% accurate drug test given to 1000 athletes, 20 of whom are doping and the other 980 not. All but one of those actually doping will be detected (95% = 19 of 20), but 49 who are not doping will also be flagged (5% of 980=49). There will be a total of 68 positive tests (19+49), of whom only 19 are actually doping. Therefore, when someone tests positive there is only a 19/68 (28%) chance that they guilty – the rest are false positives.

The author was part of the team which examined the case of Dr. Harold Shipman, who was found guilty of the deaths of fifteen of his patients in 2000, but may have killed between 215 and 260 by injecting them with lethal drugs and then altering the records to make their deaths appear to have been from natural causes. The team’s task was to see if there could have been a way to detect what Shipman was doing before he killed so many people. As it happened, there was indeed statistical evidence that might have convicted him fifteen years earlier, saving perhaps 175 deaths, but this required an exhaustive review of the circumstances around the deaths of Shipman’s patients, including a comparison with the outcomes of thousands of cases from other doctors, and even looking at the time of day most deaths occurred. The end result of this investigation was the creation of a data collection system on patient mortality that makes it easier to identify statistical anomalies, but even these must be examined with care, since doctors who work primarily with elderly patients will have higher death rates, and social factors like patient income and education can affect outcomes.

The book uses case studies like these to examine statistical reasoning and how it can be useful to non-statisticians when examining data. There is a look at the decision trees involved in deciding who the “luckiest” survivor from the Titanic was, and a truly disturbing look at how Bayesian analysis could have prevented many unnecessary cancer surgeries that resulted from physicians not understanding how to differentiate between true and false positives and negatives. There is also a good discussion on regression analysis, a powerful tool which can easily be misused to project false or misleading trends.

I enjoyed this book, and learned some useful things from it. It is written in a clear, non-technical style that anyone can follow, and the case studies were well chosen to be illuminating and informative. This is a good place to start for anyone who sometimes needs to extract meaning from numbers.

I analyzed three years’ worth of data and saw a clear trend: this was a one-tail distribution to the right. In other words, some tickets closed within minutes (like password resets or “is the printer even turned on?”), but many took longer, especially if an Admin needed to get involved or a tech had to visit the user, and a few stayed open for weeks or months because there was no solution or the solution was more expensive than the productivity loss it caused.

With this information in hand I got on the client’s calendar. My recommendation was to divide the data into quintiles and focus on the middle three, because the first one didn’t need any help and the last one required custom, one-off solutions. He didn’t know what a quintile was and had no intention of learning. He said my job was simple: just take the average ticket closure time and move it to the left. “The average,” I said, “mean, median, or mode?” He thought I was being a smartass.

I went back to my manager and raised the alarm. No matter how much we improved the middle three quintiles, the overall average closure time was not going to be affected much if we also had to include tickets open for weeks or months. He replied with the Mantra of Mediocre Managers: “just do the best you can.”

We did good work on that contract, lowering the closure times of the second, third, and fourth quintiles by 8-14%, and saving the client thousands of productive manhours per year. The overall average, however, across the entire data set when the fifth quintile was included, was reduced by only three minutes. This was cited this as a factor when our contract was not renewed. As Kurt Vonnegut would say, “And so it goes….”

In Gina Kolata’s

Dr. Hans H. Neumann, who was director of preventive medicine at the New Haven Department of Health, explained the problem in a letter to the New York Times. He wrote that if Americans have flu shots in the numbers predicted, as many as 2,300 will have strokes and 7,000 will have heart attacks within two days of being immunized. “Why? Because that is the number statistically expected, flu shots or no flu shots,” he wrote. “Yet can one expect a person who received a flu shot at noon and who that same night had a stroke not to associate somehow the two in his mind?Post hoc, ergo proter hoc,” he added. (p. 161)

Another difficulty occurs when asking people their opinions, because how the question is asked can steer responses one way or another. We live in an age of fierce partisan politics, so it is not uncommon to see polls that deliberately attempt to skew answers, but it can also happen completely by accident if the questions are not given proper consideration. This book cites an example of this kind of framing: when asked if people would support or oppose giving 16-17 year olds the vote, the majority approved, but when the question was asked in the form of whether the voting age should be reduced, most disapproved.

There is also an excellent discussion on how we can be led astray by assumptions of accuracy. Take for instance a 95% accurate drug test given to 1000 athletes, 20 of whom are doping and the other 980 not. All but one of those actually doping will be detected (95% = 19 of 20), but 49 who are not doping will also be flagged (5% of 980=49). There will be a total of 68 positive tests (19+49), of whom only 19 are actually doping. Therefore, when someone tests positive there is only a 19/68 (28%) chance that they guilty – the rest are false positives.

The author was part of the team which examined the case of Dr. Harold Shipman, who was found guilty of the deaths of fifteen of his patients in 2000, but may have killed between 215 and 260 by injecting them with lethal drugs and then altering the records to make their deaths appear to have been from natural causes. The team’s task was to see if there could have been a way to detect what Shipman was doing before he killed so many people. As it happened, there was indeed statistical evidence that might have convicted him fifteen years earlier, saving perhaps 175 deaths, but this required an exhaustive review of the circumstances around the deaths of Shipman’s patients, including a comparison with the outcomes of thousands of cases from other doctors, and even looking at the time of day most deaths occurred. The end result of this investigation was the creation of a data collection system on patient mortality that makes it easier to identify statistical anomalies, but even these must be examined with care, since doctors who work primarily with elderly patients will have higher death rates, and social factors like patient income and education can affect outcomes.

The book uses case studies like these to examine statistical reasoning and how it can be useful to non-statisticians when examining data. There is a look at the decision trees involved in deciding who the “luckiest” survivor from the Titanic was, and a truly disturbing look at how Bayesian analysis could have prevented many unnecessary cancer surgeries that resulted from physicians not understanding how to differentiate between true and false positives and negatives. There is also a good discussion on regression analysis, a powerful tool which can easily be misused to project false or misleading trends.

I enjoyed this book, and learned some useful things from it. It is written in a clear, non-technical style that anyone can follow, and the case studies were well chosen to be illuminating and informative. This is a good place to start for anyone who sometimes needs to extract meaning from numbers.

November 14, 2022

Statistika / tikimybės yra toks anti-intuityvus dalykas, kad skaityti apie ją – tai kaip žiūrėti „Mythbusters“, tik su daugiau matematikos.

Tiesa, matematikos šioje knygoje nedaug, bet daug „realaus pasaulio“ pavyzdžių, kaip suprasti ir interpretuoti statistinius duomenis, „ką ištyrė britų mokslininkai“.

Negaliu sakyti, kad taip jau giliai ir esmingai viską supratau, bet pasidalinsiu keliais labiau įstrigusiais dalykais, o jei suklydau (arba jei turite smagių pavyzdžių), pasakykite!

Bootstrapping

Kai turim (kaip manom) reprezentatyvius duomenis – tarkim, kiek žmonės turėjo seksualinių partnerių– iš tam tikro skaičiaus respondentų (pvz 50). Penki turėjo 1, trys turėjo 2, vienas turėjo 30 ir t.t. Tada surašom tuos duomenis tuo pačiu dažniu ant „kamuoliukų“ (na, ne kamuoliukų, jie ten turi programas tam), taigi bus penki kamuoliukai su skaičium 1, trys su skaičium 2 ir t.t. TADA iš to mišinio traukiam kamuoliuką, užrašom ant jo esantį skaičių, ir padedam kamuoliuką atgal. Tada vėl traukiam iš to paties maišo, ir taip 50 kartų. Na, ir kartojam šitą procedūrą daug kartų (pvz 1000), kiekvienąkart gaudami naują penkiasdešimties rinkinį. Paskui galim tuos duomenis susumuoti ir žiūrėti, kokią kreivę jie sudaro, t.y. aplink kurią vertę koncentruojasi.

Conditional probability

Ką reiškia 90% patikimumo testas? Tai reiškia, kad jis teisingai identifikuos 90% atvejų. Tai NEREIŠKIA, kad jei gavai teigiamą (pvz ligos) testą, 90% tikimybė, kad sergi.

Autorius duoda pavyzdį su krūties vėžiu, kurį mamograma identifikuoja 90% tikslumu. Jei juo serga 10 iš 1000 moterų, tai 9 iš jų bus identifikuotos teisingai, o viena gaus false negative. BET juo NESERGA 990 moterų, iš kurių 891 bus teisingai identifikuotos, kad serga, o 99 (10%) – NEteisingai identifikuotos, kad serga. Taigi 108 gaus teigiamą testą, bet iš jų serga tik 9, t.y. 8%.

Šitas man visiškai išmušė saugiklius.

Nulinė hipotezė

Kartą skaičiau „Nature“ žurnale kritiką kai kuriems socialinių mokslų tyrimams, kad jie netestuoja nulinės hipotezės, o iškart bando surasti ryšį tarp dviejų parametrų. „Tai socialiniam pasauly visada bus kažkoks ryšys tarp dviejų parametrų,“ – sakė autorė, – „Reikia pirma primesti, kad ryšys atsitiktinis, o tik po to skelbti rezultatus“.

Kaip aš supratau, pratestuoti nulinę hipotezę reiškia maždaug paskaičiuoti, kokiom kombinacijom gali išsidėlioti duomenys, ir tada TAME KONTEKSTE pažiūrėti, ant kiek stebinantys yra mūsų atrasti duomenys. Pvz 12 vyrų ir 8 moterų grupėje suskaičiuojam kairiarankius – ir atrandam, kad moterys kairiarankės trys, o vyrai kairiarankiai du. Čia, nepratestavę nulinės hipotezės, galėtume manyti, kad procentaliai daugiau kairiarankių tarp moterų. BET! Prieš darydami šią išvadą galime paimti lapukus (nu, ne lapukus, kaip jau minėjau, jie turi programas), ant penkių užrašyti „kairė“, ant penkiolikos – „dešinė“, išdalinti dalyviams kažkokia kombinacija, ir užrašyti rezultatus, kiek vyrų ir moterų gavo kokį lapuką. Po to, kai būsim išdalinę lapukus visom įmanomom kombinacijom ir užrašę rezultatus, gausime kažkokią kreivę, kiek „kairės“ lapukų dalinant atsitiktinai atitektų vyrams, ir kiek – moterims.* Tai DABAR galim pamatuoti savo pirminį rezultatą, kiek jis stebinantis tame kontekste.

* Pvz metant du kauliukus yra daugiau būdų jų sumoje gauti 7 (1+6, 2+5, 3+4), negu gauti 12 (tik 6+6) – tai tokiu atveju 7 atsidurtų kreivės viduryje, o 12 – šone. Ar kažkaip taip.

P-values

Šitas parametras būtent tai ir matuoja – primetant, kad nulinė hipotezė teisinga, KIEK STEBINANTYS yra mūsų gauti duomenys? P-value 0.05 reiškia, kad yra 5% tikimybės, jog duomenys taip sukrito atsitiktinai. 5% nėra daug, bet nėra ir mažai, kai taip jau pagalvoji, ypač jei nori skintis mokslinius laurus (jei tokie egzistuoja).

Ko P-values NEMATUOJA, sako autorius, tai nulinės hipotezės teisingumo. Čia galime grįžti prie Conditional probability – 90% testo tikslumas nereiškia, kad rezultatas yra 90% tiesa. Šita dalis man pasirodė labai tricky, bet kažkokia fundamentali.

Relative increase

Tai reiškia maždaug, kad prieš skelbiant rezultatą, reikia pažiūrėti į duomenis – ar jie tikrai reikšmingi? Autorius cituoja tyrimą, kuris skelbė, kad vyrai su aukštuoju išsilavinimu turi 19% didesnę tikimybę gauti smegenų auglį, negu vyrai be aukštojo. YIKES! Bet ar tikrai? Žiūrim į tyrimą (atliką per 18 metų su 2 milijonais dalyvių): per 3000 vyrų be aukštojo gausim 5 smegenų auglius, o per 3000 su aukštuoju – 6. Techniškai teisinga – 19% didesnė rizika, bet 5/3000 ir 6/3000 neatrodo taip jau baisiai ar netgi itin reikšmingai.

Dar autorius labai entuziastingai kalbėjo apie Bayesian statistics, bet aš tikrai nepapasakosiu, kaip tai veikia – užtat patiko pavyzdys su biliardo kamuoliais.

Tai va, faina knyga, finale galiu pasakyti, kad statistikos miškas man vis dar toks pat tamsus, bet perskaičiusi jaučiuosi kažkaip iš esmės gudresnė.

Tiesa, matematikos šioje knygoje nedaug, bet daug „realaus pasaulio“ pavyzdžių, kaip suprasti ir interpretuoti statistinius duomenis, „ką ištyrė britų mokslininkai“.

Negaliu sakyti, kad taip jau giliai ir esmingai viską supratau, bet pasidalinsiu keliais labiau įstrigusiais dalykais, o jei suklydau (arba jei turite smagių pavyzdžių), pasakykite!

Bootstrapping

Kai turim (kaip manom) reprezentatyvius duomenis – tarkim, kiek žmonės turėjo seksualinių partnerių– iš tam tikro skaičiaus respondentų (pvz 50). Penki turėjo 1, trys turėjo 2, vienas turėjo 30 ir t.t. Tada surašom tuos duomenis tuo pačiu dažniu ant „kamuoliukų“ (na, ne kamuoliukų, jie ten turi programas tam), taigi bus penki kamuoliukai su skaičium 1, trys su skaičium 2 ir t.t. TADA iš to mišinio traukiam kamuoliuką, užrašom ant jo esantį skaičių, ir padedam kamuoliuką atgal. Tada vėl traukiam iš to paties maišo, ir taip 50 kartų. Na, ir kartojam šitą procedūrą daug kartų (pvz 1000), kiekvienąkart gaudami naują penkiasdešimties rinkinį. Paskui galim tuos duomenis susumuoti ir žiūrėti, kokią kreivę jie sudaro, t.y. aplink kurią vertę koncentruojasi.

Conditional probability

Ką reiškia 90% patikimumo testas? Tai reiškia, kad jis teisingai identifikuos 90% atvejų. Tai NEREIŠKIA, kad jei gavai teigiamą (pvz ligos) testą, 90% tikimybė, kad sergi.

Autorius duoda pavyzdį su krūties vėžiu, kurį mamograma identifikuoja 90% tikslumu. Jei juo serga 10 iš 1000 moterų, tai 9 iš jų bus identifikuotos teisingai, o viena gaus false negative. BET juo NESERGA 990 moterų, iš kurių 891 bus teisingai identifikuotos, kad serga, o 99 (10%) – NEteisingai identifikuotos, kad serga. Taigi 108 gaus teigiamą testą, bet iš jų serga tik 9, t.y. 8%.

Šitas man visiškai išmušė saugiklius.

Nulinė hipotezė

Kartą skaičiau „Nature“ žurnale kritiką kai kuriems socialinių mokslų tyrimams, kad jie netestuoja nulinės hipotezės, o iškart bando surasti ryšį tarp dviejų parametrų. „Tai socialiniam pasauly visada bus kažkoks ryšys tarp dviejų parametrų,“ – sakė autorė, – „Reikia pirma primesti, kad ryšys atsitiktinis, o tik po to skelbti rezultatus“.

Kaip aš supratau, pratestuoti nulinę hipotezę reiškia maždaug paskaičiuoti, kokiom kombinacijom gali išsidėlioti duomenys, ir tada TAME KONTEKSTE pažiūrėti, ant kiek stebinantys yra mūsų atrasti duomenys. Pvz 12 vyrų ir 8 moterų grupėje suskaičiuojam kairiarankius – ir atrandam, kad moterys kairiarankės trys, o vyrai kairiarankiai du. Čia, nepratestavę nulinės hipotezės, galėtume manyti, kad procentaliai daugiau kairiarankių tarp moterų. BET! Prieš darydami šią išvadą galime paimti lapukus (nu, ne lapukus, kaip jau minėjau, jie turi programas), ant penkių užrašyti „kairė“, ant penkiolikos – „dešinė“, išdalinti dalyviams kažkokia kombinacija, ir užrašyti rezultatus, kiek vyrų ir moterų gavo kokį lapuką. Po to, kai būsim išdalinę lapukus visom įmanomom kombinacijom ir užrašę rezultatus, gausime kažkokią kreivę, kiek „kairės“ lapukų dalinant atsitiktinai atitektų vyrams, ir kiek – moterims.* Tai DABAR galim pamatuoti savo pirminį rezultatą, kiek jis stebinantis tame kontekste.

* Pvz metant du kauliukus yra daugiau būdų jų sumoje gauti 7 (1+6, 2+5, 3+4), negu gauti 12 (tik 6+6) – tai tokiu atveju 7 atsidurtų kreivės viduryje, o 12 – šone. Ar kažkaip taip.

P-values

Šitas parametras būtent tai ir matuoja – primetant, kad nulinė hipotezė teisinga, KIEK STEBINANTYS yra mūsų gauti duomenys? P-value 0.05 reiškia, kad yra 5% tikimybės, jog duomenys taip sukrito atsitiktinai. 5% nėra daug, bet nėra ir mažai, kai taip jau pagalvoji, ypač jei nori skintis mokslinius laurus (jei tokie egzistuoja).

Ko P-values NEMATUOJA, sako autorius, tai nulinės hipotezės teisingumo. Čia galime grįžti prie Conditional probability – 90% testo tikslumas nereiškia, kad rezultatas yra 90% tiesa. Šita dalis man pasirodė labai tricky, bet kažkokia fundamentali.

Relative increase

Tai reiškia maždaug, kad prieš skelbiant rezultatą, reikia pažiūrėti į duomenis – ar jie tikrai reikšmingi? Autorius cituoja tyrimą, kuris skelbė, kad vyrai su aukštuoju išsilavinimu turi 19% didesnę tikimybę gauti smegenų auglį, negu vyrai be aukštojo. YIKES! Bet ar tikrai? Žiūrim į tyrimą (atliką per 18 metų su 2 milijonais dalyvių): per 3000 vyrų be aukštojo gausim 5 smegenų auglius, o per 3000 su aukštuoju – 6. Techniškai teisinga – 19% didesnė rizika, bet 5/3000 ir 6/3000 neatrodo taip jau baisiai ar netgi itin reikšmingai.

Dar autorius labai entuziastingai kalbėjo apie Bayesian statistics, bet aš tikrai nepapasakosiu, kaip tai veikia – užtat patiko pavyzdys su biliardo kamuoliais.

Tai va, faina knyga, finale galiu pasakyti, kad statistikos miškas man vis dar toks pat tamsus, bet perskaičiusi jaučiuosi kažkaip iš esmės gudresnė.

August 17, 2019

This amazing piece can somewhat be seen as the equivalent of Angrist&Pischke's "Mastering Metrics" for bread and butter statistical problems instead of intuitive econometrics. It covers everything one has to know when it comes to scientific studies that rely on data. All aspects and elements are touched, but math and formulas are relegated to an appendix. Thus the book is well suited for experts with year-long experience, college students of all fields, but especially science writers or people that want to be well equipped when it comes to discussing or questioning the newest "study x found that y prevents cancer" headline.

Explains concepts with easy-to-grasp real world examples, appealing to the intuition of the reader. Touches upon all topics, from basic proportions, regression, classification/"big data", up to bayesian approaches. Also covering common misconceptions and fallacies on the fly ("how to lie with stats"). Everything in a very coherent and readable way. A truly joyful read!

Can be assign as a companion text for a stats undergrad course across all disciplines in order to show students, sometimes drowning in pure formula memorisaation, the beauty of stats and numbers and data. Also suited for AP stats people. Also for skilled professionals as a revision.

A big plus is the companion code for the open-source software R, which together with Python is going to be the future of (statistical) programming.

The last part of the book explains the so-called "statistical crisis in science" (or "replication crisis") and how it came about and, most importantly, what to do about it. Communication chains are analysed to understand how exaggerated newspaper headlines are created. Mot importantly the author provides check lists for the reader to be able to infer himself whether, or how much, a certain study or headline should be trusted.

Explains concepts with easy-to-grasp real world examples, appealing to the intuition of the reader. Touches upon all topics, from basic proportions, regression, classification/"big data", up to bayesian approaches. Also covering common misconceptions and fallacies on the fly ("how to lie with stats"). Everything in a very coherent and readable way. A truly joyful read!

Can be assign as a companion text for a stats undergrad course across all disciplines in order to show students, sometimes drowning in pure formula memorisaation, the beauty of stats and numbers and data. Also suited for AP stats people. Also for skilled professionals as a revision.

A big plus is the companion code for the open-source software R, which together with Python is going to be the future of (statistical) programming.

The last part of the book explains the so-called "statistical crisis in science" (or "replication crisis") and how it came about and, most importantly, what to do about it. Communication chains are analysed to understand how exaggerated newspaper headlines are created. Mot importantly the author provides check lists for the reader to be able to infer himself whether, or how much, a certain study or headline should be trusted.

September 8, 2019

I didn't like the first 60% of the book. It was too dumbed down even for me and not enough original storytelling for explaininf concepts to non math students. I even gave this feedback to the author. The last 1/3 of the book was much better,getting into p hacking, data quality, and data ethics.

September 28, 2019

Very nice overall, not much algebra but focus on the reasoning behind, interesting examples. Good for nonscientists.

June 1, 2020

Q:

A classic example of how alternative framing can change the emotional impact of a number is an advertisement that appeared on the London Underground in 2011, proclaiming that ‘99% of young Londoners do not commit serious youth violence’. These ads were presumably intended to reassure passengers about their city, but we could reverse its emotional impact with two simple changes. First, the statement means that 1% of young Londoners do commit serious violence. Second, since the population of London is around 9 million, there are around 1 million people aged between 15 and 25, and if we consider these as ‘young’, this means there are 1% of 1 million or a total of 10,000 seriously violent young people in the city. This does not sound at all reassuring, (c)

Q:

But these are generally reported as the ‘average house price’, which is a highly ambiguous term. Is this the average-house price (that is, the median)? Or the average house-price (that is, the mean)? A hyphen can make a big difference. (c)

A classic example of how alternative framing can change the emotional impact of a number is an advertisement that appeared on the London Underground in 2011, proclaiming that ‘99% of young Londoners do not commit serious youth violence’. These ads were presumably intended to reassure passengers about their city, but we could reverse its emotional impact with two simple changes. First, the statement means that 1% of young Londoners do commit serious violence. Second, since the population of London is around 9 million, there are around 1 million people aged between 15 and 25, and if we consider these as ‘young’, this means there are 1% of 1 million or a total of 10,000 seriously violent young people in the city. This does not sound at all reassuring, (c)

Q:

But these are generally reported as the ‘average house price’, which is a highly ambiguous term. Is this the average-house price (that is, the median)? Or the average house-price (that is, the mean)? A hyphen can make a big difference. (c)

May 7, 2022

Do statins reduce heart attacks and strokes?

Do speed cameras reduce accidents?

Is prayer effective?

Why do old men have big ears?

Are more boys born than girls?

Does the Higgs boson exist?

Was Richard III buried in a Leicester parking lot?

*The Art of Statistics* is a nicely packaged introductory course in statistical reasoning, in which a Cambridge professor and president of the Royal Statistical Society tries to teach some subtle and important theories without making the reader do too much math.

So this is a book about statistics for the layman, and you can hear the author in every chapter pleading for people (politicians, journalists, scientists, and the general public) to be more informed because this shit matters. But as much as the author hand-holds the reader through his examples, you are going to have to look at some numbers, and even do a little math. But if you care enough to read this book, you should know enough math to get through it.

The first few chapters talk about elementary concepts, and why statistics matter. He starts each chapter with some intriguing, sometimes silly examples of questions you can answer with statistical reasoning.

One of his introductory examples is Harold Shipman, Britain's most prolific serial killer. He was a family doctor who between 1975 and 1998 murdered hundreds of elderly patients before he was caught. Afterwards, investigators wanted to find out if he could have been detected earlier had anyone been paying attention to the death rate among his patients.

Answer: yes, and in fact he probably could have been caught in the first few years of his career, if the sort of forensic analysis of patient deaths that's done*now* had been performed then. But just looking at a chart that shows that Dr. Shipman's patients died at a higher rate than other GPs is obviously not enough - there are all kinds of confounders and other factors that need to be measured to express a degree of certainty that he's losing patients at a frequency that should really be considered alarming, and Spiegelhalter walks us through the numbers and the data visualizations to show us how it's done.

From there, he goes into many other measurements, from coin flips to number of sexual partners to predicting a child's height based on the heights of their parents. Very obvious ideas like "correlation is not causation" is covered in depth, of course, with some examples that aren't obvious at first glance. Regression models, probability theory, classification trees, bootstrapping, confidence intervals, p-values, Bayes Theorem, the Law of Large Numbers, the Central Limit Theorem — does that sound a little scary? Strap in and read up; if Spiegelhalter had his way this would be basic education at least for anyone who's graduated college, and the world would be a better place and journalists might not write stories with alarming headlines like "Threefold Variation in UK Bowel Cancer Death Rates" or "Going to university makes you more likely to die of a brain tumor." Also politicians might make decisions with some basic numeracy. Well, we can dream, right?

Two of my favorites:

**The Prosecutor's Fallacy**

The probability of innocence given the evidence is not the same as the probability of the evidence given innocence. I.e., "If the accused is innocent, there is only a 1 in a billion chance that their DNA would match the evidence at the crime scene" is wrongly interpreted as "Given the DNA evidence, there is only a 1 in a billion chance that the accused is innocent." Spiegelhalter likens this to "If you're the Pope, you're Catholic" being interpreted as meaning the same thing as "If you're Catholic, you're the Pope."

**Simpson's Paradox**

The direction of association between two variables can reverse when adjusted for a confounding factor. For example, rates of admission that show women being admitted at a lower rate than men — obvious sexism! — turn out to mean the opposite when factoring in the actual programs men and women applied for (more women apply to selective programs with a higher overall rate of rejection, but adjusting for the admission rate of each program, are overall*more* likely to be accepted than men! This plays out in many other scenarios.)

There's some discussion of communicating data, and data visualization, and of course there's every data science student's favorite problem, predicting which Titanic passengers should survive and which ones shouldn't.

Bayes Theorem (and the dispute between rival schools of statistical inference and Bayesians) gets its own chapter. If you think statistics is just hard math with provable right and wrong answers, well, it's more complicated.

Finally, Spiegelhalter talks about the so-called "replication crisis" (in which a large number of scientific papers have been found to have results that cannot be reproduced, leading many to suspect incompetence, fraud, and/or lazy research across many fields), and from there, a discussion of how bias affects statistics, and some proposed principles for ethical data science.

I have done a fair amount of machine learning and data science, so very few ideas in this book were new to me. But I found it very readable, with just enough math to require you to be comfortable with numbers, but not so much that I was straining my brain to remember how to calculate derivatives and integrals. And really, the world would be a better place if everyone knew this much, especially around election time.

Do speed cameras reduce accidents?

Is prayer effective?

Why do old men have big ears?

Are more boys born than girls?

Does the Higgs boson exist?

Was Richard III buried in a Leicester parking lot?

So this is a book about statistics for the layman, and you can hear the author in every chapter pleading for people (politicians, journalists, scientists, and the general public) to be more informed because this shit matters. But as much as the author hand-holds the reader through his examples, you are going to have to look at some numbers, and even do a little math. But if you care enough to read this book, you should know enough math to get through it.

The first few chapters talk about elementary concepts, and why statistics matter. He starts each chapter with some intriguing, sometimes silly examples of questions you can answer with statistical reasoning.

One of his introductory examples is Harold Shipman, Britain's most prolific serial killer. He was a family doctor who between 1975 and 1998 murdered hundreds of elderly patients before he was caught. Afterwards, investigators wanted to find out if he could have been detected earlier had anyone been paying attention to the death rate among his patients.

Answer: yes, and in fact he probably could have been caught in the first few years of his career, if the sort of forensic analysis of patient deaths that's done

From there, he goes into many other measurements, from coin flips to number of sexual partners to predicting a child's height based on the heights of their parents. Very obvious ideas like "correlation is not causation" is covered in depth, of course, with some examples that aren't obvious at first glance. Regression models, probability theory, classification trees, bootstrapping, confidence intervals, p-values, Bayes Theorem, the Law of Large Numbers, the Central Limit Theorem — does that sound a little scary? Strap in and read up; if Spiegelhalter had his way this would be basic education at least for anyone who's graduated college, and the world would be a better place and journalists might not write stories with alarming headlines like "Threefold Variation in UK Bowel Cancer Death Rates" or "Going to university makes you more likely to die of a brain tumor." Also politicians might make decisions with some basic numeracy. Well, we can dream, right?

Two of my favorites:

The probability of innocence given the evidence is not the same as the probability of the evidence given innocence. I.e., "If the accused is innocent, there is only a 1 in a billion chance that their DNA would match the evidence at the crime scene" is wrongly interpreted as "Given the DNA evidence, there is only a 1 in a billion chance that the accused is innocent." Spiegelhalter likens this to "If you're the Pope, you're Catholic" being interpreted as meaning the same thing as "If you're Catholic, you're the Pope."

The direction of association between two variables can reverse when adjusted for a confounding factor. For example, rates of admission that show women being admitted at a lower rate than men — obvious sexism! — turn out to mean the opposite when factoring in the actual programs men and women applied for (more women apply to selective programs with a higher overall rate of rejection, but adjusting for the admission rate of each program, are overall

There's some discussion of communicating data, and data visualization, and of course there's every data science student's favorite problem, predicting which Titanic passengers should survive and which ones shouldn't.

Bayes Theorem (and the dispute between rival schools of statistical inference and Bayesians) gets its own chapter. If you think statistics is just hard math with provable right and wrong answers, well, it's more complicated.

Finally, Spiegelhalter talks about the so-called "replication crisis" (in which a large number of scientific papers have been found to have results that cannot be reproduced, leading many to suspect incompetence, fraud, and/or lazy research across many fields), and from there, a discussion of how bias affects statistics, and some proposed principles for ethical data science.

I have done a fair amount of machine learning and data science, so very few ideas in this book were new to me. But I found it very readable, with just enough math to require you to be comfortable with numbers, but not so much that I was straining my brain to remember how to calculate derivatives and integrals. And really, the world would be a better place if everyone knew this much, especially around election time.

January 3, 2020

I really wanted to like this book. But at times it felt like it’s trying to cover too much ground and a lot of it not deep enough. Often times more technical details would have aided proper understanding of the subject.

It was also quite surprising to see supervised learning being defined as classification, which seems incorrect and also doesn’t explain what supervised learning actually is.

It was also quite surprising to see supervised learning being defined as classification, which seems incorrect and also doesn’t explain what supervised learning actually is.

February 29, 2020

When reported accurately, statistical research can enrich storytelling and inform the public about important issues. Unfortunately, there are a great many distorting filters that research has to pass through before it reaches the public, including scientific journals and the media. As statistical data creeps into our lives more and more, there is a growing need for us all to improve our data literacy so we can appropriately assess the findings.

Don’t take statistics at face value.

View statistical information the way you might view your friends: they’re the source of some great stories, but they’re not always the most accurate. Statistical information should be treated with the same skepticism you apply to other kinds of claims, facts and quotes. And, where possible, you should examine the sources of statistics behind the headlines so you can assess how accurately the information has been reported.

----

What’s in it for me? Improve your data literacy and learn to see the agenda behind the numbers.

You might think that with the growing availability of data and user-friendly statistical software that does the mathematical heavy-lifting for you, there’s less need to be trained in statistical methods.

But the ease with which data can now be accessed and analyzed has led to a rise in the use of statistical figures and graphics as a means of furnishing supposedly objective evidence for claims. Today, it’s not just scientists who make use of statistics as evidence, but also political campaigns, advertisements, and the media. As statistics are separated from their scientific basis, their role is changing to persuade rather than to inform.

And the people generating such statistical claims are not necessarily trained in statistical methods. An increasingly diverse number of sources produce and distribute statistics with very little oversight to ensure their reliability. Even when data is produced by scientists undertaking research, errors and distortions of statistical claims can occur at any point in the cycle – from flaws in the research to misrepresentations by the media and the public.

So, in today’s world, data literacy has become invaluable in order to accurately evaluate the credibility of the myriad news stories, social media posts, and arguments that use statistics as evidence. These blinks will give you all the tools you need to better assess the statistics you encounter on a daily basis.

how statistics can be used to catch serial killers;

whether drinking alcohol is good for your health or not; and

which remarkable creature can respond to human emotions even after it has died.

----

Statistics can help us answer questions about the world.

Have you ever wondered what statisticians actually do?

To many, statistics is an esoteric branch of mathematics, only slightly more interesting than the others because it makes use of pictures.

But today, the mathematical side of statistics is considered only one component of the discipline. Statistics deals with the entire lifecycle of data, which has five stages which can be summarized by the acronym

Let’s illustrate how this process works by considering a real-life case that the author was once involved in: the case of the serial killer Harold Shipman.

With 215 definite victims and 45 probable ones, Harold Shipman was the United Kingdom’s most prolific serial killer. Before his arrest in 1998, he used his position of authority as a doctor to murder many of his elderly patients. His modus operandi was to inject his patients with a lethal dose of morphine and then alter their medical records to make their deaths look natural.

The author was on the task force set up by a public inquiry to determine whether Shipman’s murders could have been detected earlier. This constitutes the first stage of the investigative cycle – the problem.

The next stage – the plan – was to collect information regarding the deaths of Shipman’s patients and compare this with information regarding other patient deaths in the area to see if there were any suspicious incongruities in the data.

The third stage of the cycle – data – involves the actual process of collecting data. In this case, that meant examining hundreds of physical death certificates from 1977 onwards.

In the fourth stage, the data was analyzed, entered into software, and compared using graphs. The analysis brought to light two things: First, Shipman’s practice recorded a much higher number of deaths than average for his area. Second, whereas patient deaths for other general practices were dispersed throughout the day, Shipman’s victims tended to die between 01:00 p.m. and 05:00 p.m. – precisely when Shipman undertook his home visits.

The final stage is the conclusion. The author’s report concluded that if someone had been monitoring the data, Shipman’s activities could have been discovered as early as 1984 – 15 years earlier – which could have saved up to 175 lives.

So, what do statisticians do? They look at patterns in data to solve real-world problems.

---

What to read next:

We’ve seen how statistical claims can be distorted in their passage from research to the public ear. Usually, these distortions of the data are unintentional and arise from a misunderstanding of statistical methods. Sometimes, however, these distortions are quite deliberate.

The blinks to How to Lie with Statistics, by author Darrell Huff, deal with this darker side of statistics. They introduce the techniques that media and advertisements use to alter how data is perceived and interpreted. They also go deeper into some familiar themes, such as the difficulty of truly random sampling, the error of inferring cause from correlation, and the misuse of averages. To avoid getting fooled, head on over to our blinks on How to Lie with Statistics.

Ref: blinkist.com

June 26, 2019

I never really got statistics when I did Maths when I was younger. The most esoteric parts of pure maths were a breeze, but statistics never clicked, in large part because nobody was able to explain to me what some of the core concepts actually mean. Chief villain in the piece is standard deviation, something I considered to be the height of charlatanism. Fast forward 20 years, and I am working in a role that actually needs to know statistics, and I'm regretting my youthful intransigence.

This book has, to a large part, undone the damage. This book is NOT a practical guide on how to do statistics. It IS a guide, something that shows you what statistics is good for, what it is not, the good and bad ways to practice it, and what each concept means. I can go and read any number of articles about how to do statistics, how to apply a particular technique, but all of them presuppose I know when I should and in what circumstances. That's where this book closes the gap. I suspect I'll need to return to this many times.

But this book goes beyond just helping specialists to do statistics. It also helps people to interpret statistics. It gives you a good groundwork in the various different principles of statistics, without getting bogged down in calculation. It also includes a significant section critiquing how statistics are communicated to the public, and I think this would of interest to anyone.

All in all, this is a very good book. I can't recommend it enough. If you have any interest in statistics, this should be on your shelf.

This book has, to a large part, undone the damage. This book is NOT a practical guide on how to do statistics. It IS a guide, something that shows you what statistics is good for, what it is not, the good and bad ways to practice it, and what each concept means. I can go and read any number of articles about how to do statistics, how to apply a particular technique, but all of them presuppose I know when I should and in what circumstances. That's where this book closes the gap. I suspect I'll need to return to this many times.

But this book goes beyond just helping specialists to do statistics. It also helps people to interpret statistics. It gives you a good groundwork in the various different principles of statistics, without getting bogged down in calculation. It also includes a significant section critiquing how statistics are communicated to the public, and I think this would of interest to anyone.

All in all, this is a very good book. I can't recommend it enough. If you have any interest in statistics, this should be on your shelf.

October 22, 2019

This book was just okay - I can't help but feel that if Spiegelhalter did one of the things he wanted to accomplish in this book it would have been great, but he tried to make this book all things to all people and it ended up being too shallow on both fronts.

I'm beating around the bush a bit but essentially Spiegelhalter wanted to 1) teach the audience about statistics and how they can make life better and 2) present some cool scenarios where statistics can get us an approximate answer to something - like how likely someone would have been to survive the Titanic, if ovarian cancer screening is good, whether busier hospitals have higher survival rates, and so on.

I found that Spiegelhalter had sections that were conversational and easy to read, and then I got whiplash going into other sections that were incredibly dense and requiring intense engagement from the reader. Ultimately it made it difficult to determine the context in which to read this - was it a casual commute read or something that I want to make sure I had a pen and paper ready to take notes for.

If you're looking for a good foundational stats book, I would recommend picking up Charles Wheelan's Naked Statistics rather than this one.

I'm beating around the bush a bit but essentially Spiegelhalter wanted to 1) teach the audience about statistics and how they can make life better and 2) present some cool scenarios where statistics can get us an approximate answer to something - like how likely someone would have been to survive the Titanic, if ovarian cancer screening is good, whether busier hospitals have higher survival rates, and so on.

I found that Spiegelhalter had sections that were conversational and easy to read, and then I got whiplash going into other sections that were incredibly dense and requiring intense engagement from the reader. Ultimately it made it difficult to determine the context in which to read this - was it a casual commute read or something that I want to make sure I had a pen and paper ready to take notes for.

If you're looking for a good foundational stats book, I would recommend picking up Charles Wheelan's Naked Statistics rather than this one.

June 4, 2020

As a data scientist, I enjoyed the non-technical aspects of this book more than the technical (though the review was welcome). Statistical training should include more courses and resources like this that remind us there is more to the practical use of statistics than just the mathematics. Publication, ethics, review, interpretation and communication all play a vital role in how studies benefit society at large. These concepts are more useful and accessible to the general population than, say, the formula for determining the p-value of a test.

August 18, 2021

Great reference read. More entertaining than you might expect with lots of interesting applications of statistics eg in predicting Harold Shipman's murders, discussion of (lack of) use of stats in courts of law, probability and politics etc. I think having basis from my degree helped so wasnt overwhelmed but it also taught me some new things and was more interesting than revision. Particularly enjoyed the section where he describes "scandinavian coutries are an epidemiologists dream" ... bodes well for masters.

January 6, 2019

I read a lot of pop-maths books and enjoy them (Hannah Fry, Du Sautoy, Simon Singh, and pervious books by Spiegelhalter). This one is a bit more chewy. Where Sex by Numbers uses statistics to tell you things, this book is much closer to a textbook on how statistics should be done and what can be learned from it.

I have learned a great deal from this and his discussions of Harold Shipman and of 95% accuracy tests giving far more false positives than accurate responses (inter alia) have been really eye-opening. The technicality of p and t tests has got a bit beyond me and one or two graphs could be clearer (though my preview copy is not coloured and so perhaps this is unfair).

Certainly one comes away from the book knowing why statistics and significance testing is becoming ever more central in subjects such as Psychology where a replication crisis is at work (and even at A-Level stats is becoming more prevalent) and his clear desire that the journalists reporting cases (he often cites examples of poor reporting) would understand teh data they use and not confuse themselves and readers.

Huge amounts to learn, but perhaps too technical in places for most of us.

I have learned a great deal from this and his discussions of Harold Shipman and of 95% accuracy tests giving far more false positives than accurate responses (inter alia) have been really eye-opening. The technicality of p and t tests has got a bit beyond me and one or two graphs could be clearer (though my preview copy is not coloured and so perhaps this is unfair).

Certainly one comes away from the book knowing why statistics and significance testing is becoming ever more central in subjects such as Psychology where a replication crisis is at work (and even at A-Level stats is becoming more prevalent) and his clear desire that the journalists reporting cases (he often cites examples of poor reporting) would understand teh data they use and not confuse themselves and readers.

Huge amounts to learn, but perhaps too technical in places for most of us.

August 27, 2021

Though I am well versed in statistics as a financial professional, I often seek out books such as this one purporting to simplify the conceptual understanding of basic principles. Often my reading this type of material is to provide a basis for me to explain concepts to my clients and others in an understandable way.

This book attempts to do away with much of the math and formulas of the discipline but to me the ideas became more of a struggle to understand without the "convenience" of math shorthand. Formulas are listed in the appendix but without the framework of explanation to make them understandable.

I found the format confusing, often having to decipher the meaning of a graph, the details of a footnote and the textual explanation often on a single page. Definitely interferes with the flow of the material. Parts of the book are excellent, especially when taking media and other folks to task for misusing statistical information and I found the explanations of simple, generic questions helpful but I think the average reader will find this a bit of a slog.

This book attempts to do away with much of the math and formulas of the discipline but to me the ideas became more of a struggle to understand without the "convenience" of math shorthand. Formulas are listed in the appendix but without the framework of explanation to make them understandable.

I found the format confusing, often having to decipher the meaning of a graph, the details of a footnote and the textual explanation often on a single page. Definitely interferes with the flow of the material. Parts of the book are excellent, especially when taking media and other folks to task for misusing statistical information and I found the explanations of simple, generic questions helpful but I think the average reader will find this a bit of a slog.

July 30, 2019

The book dealt with the spirit of applying statistics. It has very apt examples, and a clear style of writing. Reading this book can help a great deal before the reader jumps into the mechanics of Machine Learning using various models. Concepts like the 6 principles of P values, types of uncertainty, bootstrapping as an equivalent of sampling with replacement, bagging as a bootstrapping method using multiple decision trees and a consensus prediction are explained very well.

August 15, 2020

I don’t really know when I started Work/Reading but I really like it. This book was great and I am glad the author included this last chapter! Some parts may seem painful but overall it was very informative.

July 22, 2022

The Art of Statistics: How to Learn from Data

I have always been a fan of mathematics and data, algorithms and programming. This book was a good combination of the first 3.

I didn’t find it as compelling as the The Art of Computer Programming, Volume 1: Fundamental Algorithms book, but I still got a lot out of it!

Would recommend for people who love data like me!

3.7/5

I have always been a fan of mathematics and data, algorithms and programming. This book was a good combination of the first 3.

I didn’t find it as compelling as the The Art of Computer Programming, Volume 1: Fundamental Algorithms book, but I still got a lot out of it!

Would recommend for people who love data like me!

3.7/5

March 20, 2021

Ever since starting my degree in data science, I have become more and more interested in how data is used to give information, and how much it pervades so many aspects of our lives. With this background, reading this book was a beautiful summary of concepts and practices that I've learned, seen others do, or done myself, and this book has helped me to appreciate statistics and how we learn from data so much more.

However, even without any technical knowledge, I think that the author has done a great job of keeping all the explanations as understandable as possible, making it accessible to anyone showing interest in the topic. It's a great introduction to the wonderful world of data, and I am now even more excited to keep learning more about it.

However, even without any technical knowledge, I think that the author has done a great job of keeping all the explanations as understandable as possible, making it accessible to anyone showing interest in the topic. It's a great introduction to the wonderful world of data, and I am now even more excited to keep learning more about it.

June 7, 2020

This was exceptional! If you have ever wanted to learn more about the ubiquitous statistics that are a part of our lives, but worried you were going to end up reading a mathematically laden statistics text, this is the book I'd recommend!

Yes, there are some charts, graphs, and a few equations (mostly in the glossary). However, Spiegelhalter does a great job getting into the basics and provides much help in deciphering the "how did they come up with that?" that we all experience when reading an article or book that quotes studies that don't seem to make logical sense.

He starts with data and how data can be used to draw conclusions. Then builds on measurement, what causes what, and modeling. These are the basics of much published information we read, hear, or get bombarded with on a daily basis. Is it accurate? Is it reliable? Are the conclusions unbiased? We all have thought these things.

From here, he continues to regression, estimates, and probability. He is able to do this all with out bogging down into all the actual summation and stochastics that normally accompanies any discussion about statistics. It is refreshing and sets the divide of statistics for statisticians and statistics for average people.

For me, the true gem of the book is the end. Here he gives us ammunition to decipher claims or "discoveries" that may not be fully accurate. He discussed a list of items to ensure quality and ethical honesty in the data, compilation, dissemination and use of any study. Further, he discusses stories where things to go terribly wrong, not from malice, but from ignorance.

I have seen other reviews by actually statisticians and actuaries and they also think this is a fabulous book. I hope it goes mainstream and is required reading for any journalist and editors prior to their first articles written, edited, or approved.

Yes, there are some charts, graphs, and a few equations (mostly in the glossary). However, Spiegelhalter does a great job getting into the basics and provides much help in deciphering the "how did they come up with that?" that we all experience when reading an article or book that quotes studies that don't seem to make logical sense.

He starts with data and how data can be used to draw conclusions. Then builds on measurement, what causes what, and modeling. These are the basics of much published information we read, hear, or get bombarded with on a daily basis. Is it accurate? Is it reliable? Are the conclusions unbiased? We all have thought these things.

From here, he continues to regression, estimates, and probability. He is able to do this all with out bogging down into all the actual summation and stochastics that normally accompanies any discussion about statistics. It is refreshing and sets the divide of statistics for statisticians and statistics for average people.

For me, the true gem of the book is the end. Here he gives us ammunition to decipher claims or "discoveries" that may not be fully accurate. He discussed a list of items to ensure quality and ethical honesty in the data, compilation, dissemination and use of any study. Further, he discusses stories where things to go terribly wrong, not from malice, but from ignorance.

I have seen other reviews by actually statisticians and actuaries and they also think this is a fabulous book. I hope it goes mainstream and is required reading for any journalist and editors prior to their first articles written, edited, or approved.

January 9, 2023

The 10 rules for effective statistical practice:

1. Focus on why the analysis is being done rather than the particular technique to use.

2. Signals always come with noise. Probability models are useful as an abstraction.

3. Plan ahead.

4. Worry about data quality.

5. Understand why the analysis is being done; don't just plug in numbers.

6. Keep it simple. The main communication should be as basic as possible.

7. Provide an assessment of variability.

8. Check your assumptions.

9. When possible, replicate.

10. Make your analysis reproducible.

The audiobook is a good refresher on statistics. You do need to download the PDF to understand some of the content. If you're trying to learn statistics, it would be difficult to do with this audiobook.

1. Focus on why the analysis is being done rather than the particular technique to use.

2. Signals always come with noise. Probability models are useful as an abstraction.

3. Plan ahead.

4. Worry about data quality.

5. Understand why the analysis is being done; don't just plug in numbers.

6. Keep it simple. The main communication should be as basic as possible.

7. Provide an assessment of variability.

8. Check your assumptions.

9. When possible, replicate.

10. Make your analysis reproducible.

The audiobook is a good refresher on statistics. You do need to download the PDF to understand some of the content. If you're trying to learn statistics, it would be difficult to do with this audiobook.

June 5, 2020

I can't remember where I read reviews of this being very good, but they were right. Here Spiegelhalter ("Spiegelhalter, Spiegelhalter an der Wand, wer ist die Schönste im ganzen Land?") attempts to explain the uses and abuses of statistics and probability. It's well put together, well explained, well illustrated. The pace is good, the examples well-chosen. I can't really complain, and I'm only really giving it a harsh four stars because.. well.. it's a book about statistics. A very good one, an informative one, but dry in truth - quirky bits in necessary moderation.

February 15, 2019

The clearest and best introduction to statistics written by one of the greatest living statisticians. This book does not dumb down the content it presents the latest thinking about data in a clear and accessible way. I would recommend this book to anyone who is really interested in learning about data and trying to separate facts from fiction, but it is also a perfect introductory text for an undergraduate statistics course for those who are afraid of statistics. It is a pleasure to read.

a really solid, math-free overview of statistics that was heavy on real world examples. if you want to ask smarter questions about the numbers/stats you see, this is a great book!

April 14, 2021

Probably the closest thing to a textbook without explicitly being a textbook.

I love how infamous Carmen Reinhart and Kenneth Rogoff have become for being too stupid to use Microsoft Excel.

I love how infamous Carmen Reinhart and Kenneth Rogoff have become for being too stupid to use Microsoft Excel.

January 31, 2021

Not to reveal my age, but I haven’t been in a math class since the late ‘90s, and as luck would have it, it was a stats class. However, with all the talk about following the science on mask wearing, eating in restaurants, keeping schools open, etc. I thought it was time to take myself to task. Hey, if every other armchair statistician was mouthing off on Facebook, why not take a dive into the stats myself?

And so, I started with Spiegelhalter’s book ‘The Art of Statistics: How to Learn from Data” to see what I knew about the subject. Bonus points for its colorful cover.

Let’s start with the big surprise, there is no single unifying theory of statistical inference (p. 305). Wait, what? Yes, turns out there are three competing approaches: Fisher, Neyman-Pearson, and Bayesian. I won’t bore you with the details, and let’s be realistic, I’d have to make some flashcards and get to memorizing before I could explain much about those differences, but let’s just say there are different approaches to drawing conclusions from data.

I’m also happy to report that your average person, myself included, doesn’t understand probability and chance. Friendly reminder, luck has nothing to do with numbers.

What appeals to me most about the discipline is the idea of transforming our life experiences into data. That we can draw inferences about general principles from specific examples is fascinating to me, as is the idea of overfitting. When the algorithms get too complex, we start fitting the noise rather than the signal. The goal is to find the signal in the noise, not to make the noise louder.

To be honest, I skipped over some of the more complex theoretical sections. Since I’m not running any research studies or statistical analyses any time soon, I think the world is safe from my armchair interpretations.

If you’re looking to revisit the subject or learn more about it for the first time, this book is a great entry point. It’s easy to read and filled with engaging real-life examples.

As for my thoughts on what’s safe for the public during the pandemic and when, I’ll defer to the public health experts and epidemiologists. Because despite my reading and life experiences, I, like most other citizens, do not have the stats skills to look at the bigger picture. Certainly, I encourage you to read broadly from a variety of academic and popular sources, but let’s not kid ourselves, interpreting data is hard, and is best left to those who understand how to do it at an advanced level, especially when it comes to matters of life and death.

October 7, 2022

It was pretty good, but it had the same problem for me as many popular math books -- it was too easy and too hard at the same time. In the beginning there is an extended discussion of all of the difficulties of meaningful statistical modeling - how do you define the population that you are studying, how do you pick a representative sample, how do you know if your sample is big enough, etc. The author also discusses basic ideas such as mean, mode, median, normal distributions and standard deviations, and basic rules for computation of probabilities and goes on to discuss the basics of Bayesian analysis. All of this and all of the discussions of misrepresentations of statistical results, P hacking and the Reproducibility Crisis were old news to me. On the other hand, there were some ideas that were new to me such as bootstrapping and Poisson distributions. This book was not the best format for introducing new concepts. I'd like to learn more about these things, but that will require me to get a book that is more of a text or find a course or some other material on the internet that allows me to see more examples and graphs and work through a few problems. Unless you are naturally brilliant, it's too hard to learn new math without doing some exercises. In this book, as soon as the author moved into new territory that I didn't already know well, I began to have trouble following his analysis and conclusions.

I expect most people will have the same too easy/too hard problem that I had, though people at different levels of knowledge will draw the line in different places. I definitely wouldn't recommend this book for a complete beginner, who would find it too hard, or to someone who already has taken a couple of college level classes in probability and statistics, who will find it too easy. For those of us in the middle, it is a half and half experience, with neither the easy part or the hard part being fully satisfying.

I expect most people will have the same too easy/too hard problem that I had, though people at different levels of knowledge will draw the line in different places. I definitely wouldn't recommend this book for a complete beginner, who would find it too hard, or to someone who already has taken a couple of college level classes in probability and statistics, who will find it too easy. For those of us in the middle, it is a half and half experience, with neither the easy part or the hard part being fully satisfying.

March 15, 2021

This book caught my eye as I studied statistics. It falls in between a popular science book and a statistical textbook on the technical spectrum, which is great for statisticians and interested readers alike.

What I liked is that it gave a gentle overview of fundamentals, which always serves to bring the important principles to the front of your mind again. For non-statistics readers, it would provide a greater understanding of statistics, without having to know all the maths - but it doesn’t gloss over important concepts either.

The author used real life cases and research questions to apply what he was explaining and the use of charts and tables was relevant and easy to grasp at first glance.

The discussions also included the history and development and application of the concepts and theories which was really interesting. The book ended with discussion of how analyses can ‘go wrong’ and how they can be done ‘better’, which I thought was a great inclusion, given that rigour and interpretation and reproducibility are topics frequently discussed.

What I liked is that it gave a gentle overview of fundamentals, which always serves to bring the important principles to the front of your mind again. For non-statistics readers, it would provide a greater understanding of statistics, without having to know all the maths - but it doesn’t gloss over important concepts either.

The author used real life cases and research questions to apply what he was explaining and the use of charts and tables was relevant and easy to grasp at first glance.

The discussions also included the history and development and application of the concepts and theories which was really interesting. The book ended with discussion of how analyses can ‘go wrong’ and how they can be done ‘better’, which I thought was a great inclusion, given that rigour and interpretation and reproducibility are topics frequently discussed.

Displaying 1 - 30 of 386 reviews