The Art of Statistics: Learning from Data
Rate it:
Open Preview
Read between July 6 - October 23, 2020
15%
Flag icon
And so it is presumably better to follow Galton and use the median as a group-guess. This turns out to be 1,775 beans. The true value was … 1,616.2 Just one person guessed this precisely, 45% of people guessed below 1,616, and 55% guessed above, so there was little systematic tendency for the guesses to be either on the high or low side – we say the true value lay at the 45th percentile of the empirical data distribution.
18%
Flag icon
four common features of a good data visualization: It contains reliable information. The design has been chosen so that relevant patterns become noticeable. It is presented in an attractive manner, but appearance should not get in the way of honesty, clarity and depth. When appropriate, it is organized in a way that enables some exploration.
27%
Flag icon
Simpson’s paradox, which occurs when the apparent direction of an association is reversed by adjusting for a confounding factor, requiring a complete change in the apparent lesson from the data.
32%
Flag icon
‘All models are wrong, some are useful.’
48%
Flag icon
after a run of heads, and the temptation is to believe that tails is somehow now ‘due’ so that the proportion gets balanced out – this is known as the ‘gambler’s fallacy’ and is a psychological bias that (from personal experience) is rather difficult to overcome. But the coin has no memory – the key insight is that the coin cannot compensate for past imbalances, but simply overwhelms them by more and more new, independent flips.
52%
Flag icon
The null hypothesis is what we are willing to assume is the case until proven otherwise. It is relentlessly negative, denying all progress and change. But this does not mean that we actually believe the null hypothesis is literally true:
57%
Flag icon
Type I error is made when we reject a null hypothesis when it is true, and a Type II error is made when we do not reject a null hypothesis when in fact the alternative hypothesis holds.
57%
Flag icon
Type I legal error is to falsely convict an innocent person, and a Type II error is to find someone ‘not guilty’ when in fact they did commit the crime.
69%
Flag icon
HARKing – inventing the Hypotheses After the Results are Known.
76%
Flag icon
confounder: a variable which is associated with both a response and a predictor, and which may explain some of their apparent relationship. For example, the height and weight of children are strongly correlated, but much of this association is explained by the age of the child.
80%
Flag icon
PPDAC: a proposed structure for the ‘data cycle’, comprising Problem, Plan, Data collection, Analysis (exploratory or confirmatory) and Conclusions and communication.
80%
Flag icon
prosecutor’s fallacy: when a small probability of the evidence, given innocence, is mistakenly interpreted as the probability of innocence, given the evidence.
81%
Flag icon
reverse causation: when an association between two variables initially appears to be causal, but could in fact be acting in the opposite direction. For example, people who do not drink alcohol tend to have poorer health outcomes than moderate drinkers, but this is at least partly due to some non-drinkers having given up alcohol due to poor health.