More on this book
Community
Kindle Notes & Highlights
Read between
October 8 - October 8, 2022
So to turn experience into data, we have to start with rigorous definitions.
PPDAC
The PPDAC cycle provides a convenient framework: Problem – Plan – Data – Analysis – Conclusion and communication.
basic presentation of statistics is important.
The audience’s emotional response to the table may also be influenced by the choice of which columns to display. Table 1.1 shows the results in terms of both survivors and deaths, but in the US mortality rates from child heart surgery are reported, while the UK provides survival rates. This is known as negative or positive framing, and its overall effect on how we feel is intuitive and well-documented: ‘5% mortality’ sounds worse than ‘95% survival’.
Reporting the actual number of deaths as well as the percentage can also increase the impression of risk, as this total might then be imagined as a crowd of real people.
Note the two tricks used to manipulate the impact of this statistic: convert from a positive to a negative frame, and then turn a percentage into actual numbers of people.
But the oldest trick of misleading graphics is to start the axis at say 95%, which will make the hospitals look extremely different, even if the variation is in fact only what is attributable to chance alone.
Technically, the odds for an event is the ratio of the chance of the event happening to the chance of it not happening.
we might report either a 2% increase in absolute risk, or a relative risk of 0.87/0.85 = 1.02, that is a 2% relative increase in risk. The odds in the two groups are given by 0.87/0.13 = 6.7 and 0.85/0.15 = 5.7, and so the odds ratio is therefore 6.7/5.7 = 1.18: exactly the same as for bacon sandwiches, but based on very different absolute risks.
logarithmic scale, where the space between 100 and 1,000 is the same as the space between 1,000 and 10,000.
Mean-averages can be highly misleading when the raw data do not form a symmetric pattern around a central value but instead are skewed towards one side
In this case it might help to distinguish between ‘average income’ (mean) and ‘the income of the average person’ (median).
When exploring data, a primary aim is to find factors that explain the overall variation.
But often the question goes beyond simple description of data: we want to learn something bigger than just the observations in front of us, whether it is to make predictions (how many will come next year?), or say something more basic (why are the numbers increasing?).
But induction works the other way, in taking particular instances and trying to work out general conclusions.
The crucial distinction is that deduction is logically certain, whereas induction is generally uncertain.
He said that if you have cooked a large pan of soup, you do not need to eat it all to find out if it needs more seasoning. You can just taste a spoonful, provided you have given it a good stir.
Z-score, which simply measures how many standard deviations a data-point is from the mean.
So a population can be thought of as a physical group of individuals, but also as providing the probability distribution for a random observation.
This is a huge study based on a registry of the complete eligible population – not a sample – so we can confidently conclude that slightly more brain tumours really were found in more-educated people.
‘correlation does not imply causation’.
There is even a word for the tendency to construct reasons for a connection between what are actually unrelated events – apophenia – with the most extreme case being when simple misfortune or bad luck is blamed on others’ ill-will or even witchcraft.
So we can never say that X caused Y in a specific case, only that X increases the proportion of times that Y happens.
The absolute reduction in the risk of a heart attack was 11.8 − 8.7 = 3.1%.
When the data does not arise from an experiment, it is said to be observational.
cross-sectional study: their Analysis showed a clear positive correlation, and their Conclusions were that ear-length was associated with age.7
Causation, in the statistical sense, means that when we intervene, the chances of different outcomes are systematically changed.
Observational data may have background factors influencing the apparent observed relationships between an exposure and an outcome, which may be either observed confounders or lurking factors.
residual error – although it is important to remember that in statistical modelling, ‘error’ does not refer to a mistake, but the inevitable inability of a model to exactly represent what we observe.
This section contains a simple lesson: just because we act, and something changes, it doesn’t mean we were responsible for the result. Humans seem to find this simple truth difficult to grasp – we are always keen to construct an explanatory narrative, and even keener if we are at its centre.
The British statistician George Box has become famous for his brief but invaluable aphorism: ‘All models are wrong, some are useful.’
The basic desire to find the signal in the noise is just as relevant when we just want a method that will help in a particular decision faced in our daily lives.

