The Art of Statistics: How to Learn from Data
Rate it:
Read between October 8 - October 8, 2022
5%
Flag icon
So to turn experience into data, we have to start with rigorous definitions.
6%
Flag icon
PPDAC
7%
Flag icon
The PPDAC cycle provides a convenient framework: Problem – Plan – Data – Analysis – Conclusion and communication.
8%
Flag icon
basic presentation of statistics is important.
8%
Flag icon
The audience’s emotional response to the table may also be influenced by the choice of which columns to display. Table 1.1 shows the results in terms of both survivors and deaths, but in the US mortality rates from child heart surgery are reported, while the UK provides survival rates. This is known as negative or positive framing, and its overall effect on how we feel is intuitive and well-documented: ‘5% mortality’ sounds worse than ‘95% survival’.
8%
Flag icon
Reporting the actual number of deaths as well as the percentage can also increase the impression of risk, as this total might then be imagined as a crowd of real people.
9%
Flag icon
Note the two tricks used to manipulate the impact of this statistic: convert from a positive to a negative frame, and then turn a percentage into actual numbers of people.
9%
Flag icon
But the oldest trick of misleading graphics is to start the axis at say 95%, which will make the hospitals look extremely different, even if the variation is in fact only what is attributable to chance alone.
11%
Flag icon
Technically, the odds for an event is the ratio of the chance of the event happening to the chance of it not happening.
11%
Flag icon
we might report either a 2% increase in absolute risk, or a relative risk of 0.87/0.85 = 1.02, that is a 2% relative increase in risk. The odds in the two groups are given by 0.87/0.13 = 6.7 and 0.85/0.15 = 5.7, and so the odds ratio is therefore 6.7/5.7 = 1.18: exactly the same as for bacon sandwiches, but based on very different absolute risks.
13%
Flag icon
logarithmic scale, where the space between 100 and 1,000 is the same as the space between 1,000 and 10,000.
13%
Flag icon
Mean-averages can be highly misleading when the raw data do not form a symmetric pattern around a central value but instead are skewed towards one side
13%
Flag icon
In this case it might help to distinguish between ‘average income’ (mean) and ‘the income of the average person’ (median).
17%
Flag icon
When exploring data, a primary aim is to find factors that explain the overall variation.
19%
Flag icon
But often the question goes beyond simple description of data: we want to learn something bigger than just the observations in front of us, whether it is to make predictions (how many will come next year?), or say something more basic (why are the numbers increasing?).
19%
Flag icon
But induction works the other way, in taking particular instances and trying to work out general conclusions.
19%
Flag icon
The crucial distinction is that deduction is logically certain, whereas induction is generally uncertain.
20%
Flag icon
He said that if you have cooked a large pan of soup, you do not need to eat it all to find out if it needs more seasoning. You can just taste a spoonful, provided you have given it a good stir.
22%
Flag icon
Z-score, which simply measures how many standard deviations a data-point is from the mean.
22%
Flag icon
So a population can be thought of as a physical group of individuals, but also as providing the probability distribution for a random observation.
24%
Flag icon
This is a huge study based on a registry of the complete eligible population – not a sample – so we can confidently conclude that slightly more brain tumours really were found in more-educated people.
24%
Flag icon
‘correlation does not imply causation’.
24%
Flag icon
There is even a word for the tendency to construct reasons for a connection between what are actually unrelated events – apophenia – with the most extreme case being when simple misfortune or bad luck is blamed on others’ ill-will or even witchcraft.
24%
Flag icon
So we can never say that X caused Y in a specific case, only that X increases the proportion of times that Y happens.
25%
Flag icon
The absolute reduction in the risk of a heart attack was 11.8 − 8.7 = 3.1%.
26%
Flag icon
Event: Heart attack Percentage in 10,267 people allocated placebo: 11.8 Percentage in 10,269 people allocated statin: 8.7 % (relative) risk reduction in those allocated statins: 27%
Andrew Ferreira
(8.7/11.8)-1
27%
Flag icon
When the data does not arise from an experiment, it is said to be observational.
27%
Flag icon
cross-sectional study: their Analysis showed a clear positive correlation, and their Conclusions were that ear-length was associated with age.7
29%
Flag icon
Causation, in the statistical sense, means that when we intervene, the chances of different outcomes are systematically changed.
29%
Flag icon
Observational data may have background factors influencing the apparent observed relationships between an exposure and an outcome, which may be either observed confounders or lurking factors.
31%
Flag icon
residual error – although it is important to remember that in statistical modelling, ‘error’ does not refer to a mistake, but the inevitable inability of a model to exactly represent what we observe.
31%
Flag icon
This section contains a simple lesson: just because we act, and something changes, it doesn’t mean we were responsible for the result. Humans seem to find this simple truth difficult to grasp – we are always keen to construct an explanatory narrative, and even keener if we are at its centre.
33%
Flag icon
The British statistician George Box has become famous for his brief but invaluable aphorism: ‘All models are wrong, some are useful.’
34%
Flag icon
The basic desire to find the signal in the noise is just as relevant when we just want a method that will help in a particular decision faced in our daily lives.