More on this book
Community
Kindle Notes & Highlights
Read between
July 6 - October 23, 2020
And so it is presumably better to follow Galton and use the median as a group-guess. This turns out to be 1,775 beans. The true value was … 1,616.2 Just one person guessed this precisely, 45% of people guessed below 1,616, and 55% guessed above, so there was little systematic tendency for the guesses to be either on the high or low side – we say the true value lay at the 45th percentile of the empirical data distribution.
four common features of a good data visualization: It contains reliable information. The design has been chosen so that relevant patterns become noticeable. It is presented in an attractive manner, but appearance should not get in the way of honesty, clarity and depth. When appropriate, it is organized in a way that enables some exploration.
Simpson’s paradox, which occurs when the apparent direction of an association is reversed by adjusting for a confounding factor, requiring a complete change in the apparent lesson from the data.
‘All models are wrong, some are useful.’
after a run of heads, and the temptation is to believe that tails is somehow now ‘due’ so that the proportion gets balanced out – this is known as the ‘gambler’s fallacy’ and is a psychological bias that (from personal experience) is rather difficult to overcome. But the coin has no memory – the key insight is that the coin cannot compensate for past imbalances, but simply overwhelms them by more and more new, independent flips.
The null hypothesis is what we are willing to assume is the case until proven otherwise. It is relentlessly negative, denying all progress and change. But this does not mean that we actually believe the null hypothesis is literally true:
Type I error is made when we reject a null hypothesis when it is true, and a Type II error is made when we do not reject a null hypothesis when in fact the alternative hypothesis holds.
Type I legal error is to falsely convict an innocent person, and a Type II error is to find someone ‘not guilty’ when in fact they did commit the crime.
HARKing – inventing the Hypotheses After the Results are Known.
confounder: a variable which is associated with both a response and a predictor, and which may explain some of their apparent relationship. For example, the height and weight of children are strongly correlated, but much of this association is explained by the age of the child.
PPDAC: a proposed structure for the ‘data cycle’, comprising Problem, Plan, Data collection, Analysis (exploratory or confirmatory) and Conclusions and communication.
prosecutor’s fallacy: when a small probability of the evidence, given innocence, is mistakenly interpreted as the probability of innocence, given the evidence.
reverse causation: when an association between two variables initially appears to be causal, but could in fact be acting in the opposite direction. For example, people who do not drink alcohol tend to have poorer health outcomes than moderate drinkers, but this is at least partly due to some non-drinkers having given up alcohol due to poor health.