The Art of Statistics: Learning from Data
Rate it:
Open Preview
Kindle Notes & Highlights
Read between November 23 - December 9, 2020
18%
Flag icon
Alberto Cairo has identified four common features of a good data visualization: It contains reliable information. The design has been chosen so that relevant patterns become noticeable. It is presented in an attractive manner, but appearance should not get in the way of honesty, clarity and depth. When appropriate, it is organized in a way that enables some exploration.
18%
Flag icon
We just want to tell it how it is, or at least how it seems to be, and while we cannot ever claim to tell the absolute truth, we can at least try to be as truthful as possible.
Matt liked this
18%
Flag icon
The first rule of communication is to shut up and listen, so that you can get to know about the audience for your communication, whether it might be politicians, professionals or the general public. We have to understand their inevitable limitations and any misunderstandings, and fight the temptation to be too sophisticated and clever, or put in too much detail.
Matt liked this
21%
Flag icon
George Gallup, who essentially invented the idea of the opinion poll in the 1930s, came up with a fine analogy for the value of random sampling. He said that if you have cooked a large pan of soup, you do not need to eat it all to find out if it needs more seasoning. You can just taste a spoonful, provided you have given it a good stir.
30%
Flag icon
This section contains a simple lesson: just because we act, and something changes, it doesn’t mean we were responsible for the result. Humans seem to find this simple truth difficult to grasp – we are always keen to construct an explanatory narrative, and even keener if we are at its centre.
32%
Flag icon
The British statistician George Box has become famous for his brief but invaluable aphorism: ‘All models are wrong, some are useful.’
Matt liked this
67%
Flag icon
The Reproducibility Project found that replication effects were on average in the same direction as the original studies, but were around half their magnitude. This points to an important bias in the scientific literature: a study which has found something ‘big’, at least some of which is likely to have been luck, is likely to lead to a prominent publication. In an analogy to regression to the mean, this might be termed ‘regression to the null’, where early exaggerated estimates of effects later decrease in magnitude towards the null hypothesis.
67%
Flag icon
The claimed reproducibility crisis is a complex issue, rooted in the excessive pressure put on researchers to make ‘discoveries’ and publish their results in prestigious scientific journals, all of which is crucially dependent on finding statistically significant results. No single institution or profession is to blame. We have also showed when discussing hypothesis testing that, even if statistical practice were perfect, the rarity of true and substantial effects means a substantial proportion of results that are claimed to be ‘significant’ are inevitably going to be false-positives (Figure ...more
69%
Flag icon
This leaves valuable data sitting in the ‘file drawer’, and creates a positive bias to what appears in the literature. We do not know what we are not being told. This positive bias is made worse by ‘discoveries’ that are more likely to be accepted for publication in more prominent journals, an unwillingness to publish replications, and of course all the questionable research practices that we have seen can lead to exaggerated statistical significance.
74%
Flag icon
Statistical methods should enable data to answer scientific questions: Ask ‘why am I doing this?’, rather than focusing on which particular technique to use. Signals always come with noise: It is trying to separate out the two that makes the subject interesting. Variability is inevitable, and probability models are useful as an abstraction. Plan ahead, really ahead: This includes the idea of pre-specification in confirmatory experiments – avoiding researcher degrees of freedom Worry about data quality: Everything rests on the data. Statistical analysis is more than a set of computations: Do ...more