The Art of Statistics: Learning from Data
Rate it:
Open Preview
Read between April 13 - April 19, 2019
6%
Flag icon
R code and data for reproducing most of the analyses and Figures are available from https://github.com/dspiegel29/ArtofStatistics.
6%
Flag icon
The numbers have no way of speaking for themselves. We speak for them. We imbue them with meaning. — Nate Silver, The Signal and the Noise
7%
Flag icon
eventually came up with an estimated total of 3.04 trillion (that is 3,040,000,000,000) trees on the planet. This sounds a lot, except they reckoned there used to be twice this number.fn33
7%
Flag icon
the official definition of ‘unemployment’ in the UK was changed at least thirty-one times between 1979 and 1996.
12%
Flag icon
We need to distinguish what is actually dangerous from what sounds frightening.
12%
Flag icon
But using multiple ‘1 in …’ statements is not recommended, as many people find them difficult to compare. For example, when asked the question, ‘Which is the bigger risk, 1 in 100, 1 in 10 or 1 in 1,000?’, around a quarter of people answered incorrectly: the problem is that the bigger number is associated with the smaller risk, and so some mental dexterity is required to keep things clear.
13%
Flag icon
Although extremely common in the research literature, odds ratios are a rather unintuitive way to summarize differences in risk. If the events are fairly rare then the odds ratios will be numerically close to the relative risks, as in the case of bacon sandwiches, but for common events the odds ratio can be very different from the relative risk,
17%
Flag icon
Alberto Cairo has identified four common features of a good data visualization: It contains reliable information. The design has been chosen so that relevant patterns become noticeable. It is presented in an attractive manner, but appearance should not get in the way of honesty, clarity and depth. When appropriate, it is organized in a way that enables some exploration.
20%
Flag icon
For example, in 2017 budget airline Ryanair announced that 92% of their passengers were satisfied with their flight experience. It turned out that their satisfaction survey only permitted the answers, ‘Excellent, very good, good, fair, OK’.fn2
20%
Flag icon
George Gallup, who essentially invented the idea of the opinion poll in the 1930s, came up with a fine analogy for the value of random sampling. He said that if you have cooked a large pan of soup, you do not need to eat it all to find out if it needs more seasoning. You can just taste a spoonful, provided you have given it a good stir.
25%
Flag icon
The list of principles for RCTs is not new: they were nearly all introduced in 1948 in what is generally considered the first proper clinical trial. This was of streptomycin, a drug prescribed for tuberculosis.
25%
Flag icon
There have even been studies to determine the effectiveness of prayer. For example, the Study of the Therapeutic Effects of Intercessory Prayer (STEP) randomly allocated over 1,800 cardiac bypass patients into three groups: patients in Groups 1 and 2 were prayed for and not prayed for, respectively, but did not know which was the case, while Group 3 knew they were being prayed for. The only apparent effect was a small increase in complications in the group that knew they were being prayed for: one of the researchers commented, ‘It may have made them uncertain, wondering, “Am I so sick they had ...more
26%
Flag icon
any correlation between ice-cream sales and drownings is due to both being influenced by the weather.
26%
Flag icon
Simpson’s paradox, which occurs when the apparent direction of an association is reversed by adjusting for a confounding factor, requiring a complete change in the apparent lesson from the data.
28%
Flag icon
Causation, in the statistical sense, means that when we intervene, the chances of different outcomes are systematically changed.
29%
Flag icon
The US Federal Reserve define a model as a ‘representation of some aspect of the world which is based on simplifying assumptions’: essentially some phenomenon will be represented mathematically, generally embedded in computer software, in order to produce a simplified ‘pretend’ version of reality.
31%
Flag icon
The British statistician George Box has become famous for his brief but invaluable aphorism: ‘All models are wrong, some are useful.’
34%
Flag icon
If we are trying to detect survivors, the percentage of true survivors that are correctly predicted is known as the sensitivity of the algorithm, while the percentage of true non-survivors that are correctly predicted is known as the specificity. These terms arise from medical diagnostic testing.
37%
Flag icon
This is known as over-fitting, and is one of the most vital topics in algorithm construction. By making an algorithm too complex, we essentially start fitting the noise rather than the signal.
43%
Flag icon
In 2012, 97 Members of Parliament in London were asked: ‘If you spin a coin twice, what is the probability of getting two heads?’ The majority, 60 out of 97, could not give the correct answer.
49%
Flag icon
two types of uncertainty: what is known as aleatory uncertainty before I flip the coin – the ‘chance’ of an unpredictable event – and epistemic uncertainty after I flip the coin – an expression of our personal ignorance about an event that is fixed but unknown. The same difference exists between a lottery ticket (where the outcome depends on chance) and a scratch card (where the outcome is already decided, but you don’t know what it is).
53%
Flag icon
A P-value is the probability of getting a result at least as extreme as we did, if the null hypothesis (and all other modelling assumptions) were really true.
56%
Flag icon
One way around this problem is to demand a very low P-value at which significance is declared, and the simplest method, known as the Bonferroni correction, is to use a threshold of 0.05/n, where n is number of tests done. So, for example, the tests at each site of the salmon’s brain could be carried out demanding a P-value of 0.05/8,000 = 0.00000625, or 1 in 160,000. This technique has become standard practice when searching the human genome for sites with association with diseases: since there are roughly 1,000,000 sites for genes, a P-value smaller than 0.05/1,000,000 = 1 in 20 million is ...more
56%
Flag icon
For new pharmaceuticals to be approved by the US Food and Drug Administration, it has become standard that two independent clinical trials must have been carried out, each showing clinical benefit that is significant at P < 0.05. This means that the overall chance of approving a drug, that in truth has no benefit at all, is 0.05 × 0.05 = 0.0025, or 1 in 400.
58%
Flag icon
Sequential Probability Ratio Test (SPRT), which is a statistic that monitors accumulating evidence about deviations, and can at any time be compared with simple thresholds – as soon as one of these thresholds is crossed, then an alert is triggered and the production line is investigated.fn8 Such techniques led to more efficient industrial processes, and were later adapted for use in so-called sequential clinical trials in which accumulated results are repeatedly monitored to see if a threshold that indicates a beneficial treatment has been crossed.
61%
Flag icon
A P-value is a measure of the incompatibility between the observed data and a null hypothesis: formally it is the probability of observing such an extreme result, were the null hypothesis true.
67%
Flag icon
The Reproducibility Project was a major collaboration in which 100 psychological studies were repeated with larger sample sizes, and so had higher power to detect a true effect if it existed. The project revealed that whereas 97% of the original studies had statistically significant results, only 36% of the replications did.
68%
Flag icon
As Ronald Fisher famously put it, ‘To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.’3
69%
Flag icon
these were revealed at the end of the paper, which has become a classic deliberate demonstration of the practice of what is now known as HARKing – inventing the Hypotheses After the Results are Known.
73%
Flag icon
Ten Questions to Ask When Confronted by a Claim Based on Statistical Evidence HOW TRUSTWORTHY ARE THE NUMBERS? How rigorously has the study been done? For example, check for ‘internal validity’, appropriate design and wording of questions, pre-registration of the protocol, taking a representative sample, using randomization, and making a fair comparison with a control group. What is the statistical uncertainty / confidence in the findings? Check margins of error, confidence intervals, statistical significance, sample size, multiple comparisons, systematic bias. Is the summary appropriate? ...more
This highlight has been truncated due to consecutive passage length restrictions.
83%
Flag icon
A primary recommendation of the American Statistical Association is to ‘Teach statistics as an investigative process of problem-solving and decision-making’. See https://www.amstat.org/asa/education/Guidelines-for-Assessment-and-Instruction-in-Statistics-Education-Reports.aspx. The PPDAC cycle was developed in R. J. MacKay and R. W. Oldford, ‘Scientific Method, Statistical Method and the Speed of Light’, Statistical Science 15 (2000), 254–78. It is strongly promoted in the New Zealand schools system, which provides a highly developed education in statistics. See C. J. Wild and M. Pfannkuch, ...more
83%
Flag icon
World Health Organization. Q&A on the carcinogenicity of the consumption of red meat and processed meat is at http://www.who.int/features/qa/cancer-red-meat/en/. ‘Bacon, Ham and Sausages Have the Same Cancer Risk as Cigarettes Warn Experts’, Daily Record, 23 October 2015.
84%
Flag icon
Reported on More or Less, 5 October 2018; https://www.bbc.co.uk/programmes/p06n2lmp. The classic demonstration of priming occurs in the UK comedy series Yes, Prime Minister, when top civil servant Sir Humphrey Appleby shows how suitable leading questions can result in any answer desired. This example is now used in teaching research methods. https://researchmethodsdataanalysis.blogspot.com/2014/01/leading-questions-yes-prime-minister.html
84%
Flag icon
For a fascinating discussion of the risk of modelling, see A. Aggarwal et al., ‘Model Risk – Daring to Open Up the Black Box’, British Actuarial Journal 21:2 (2016),
87%
Flag icon
Open Science Framework: https://osf.io/.
96%
Flag icon
I now regret using the term ‘excess deaths’, since newspapers later interpreted this as meaning ‘avoidable deaths’. But around half of hospitals will have more deaths than expected just by chance alone, and only a few of these deaths might be judged avoidable.
97%
Flag icon
Rosling was once arguing on TV with a Danish journalist who was parroting the sort of misconceptions about the world that Hans spent his life trying to counter. Hans simply replied, ‘These facts are not up for discussion. I am right, and you are wrong’ – which, for statistics, is unusually straight speaking.
97%
Flag icon
Not to be confused with sortilege, which is a form of divination in which apparently chance phenomena are used to determine divine will or future fortune – this is also known as cleromancy. Examples exist in many cultures, including fortune-telling using tea-leaves or chicken entrails, biblical casting of lots to determine the will of God, and divination using the I Ching.