Piers’s Kindle Notes & Highlights for The Art of Statistics: Learning from Data

Rate it:

Open Preview

More on this book

Community

Petra

1 note & 1 highlight

Andrew Ferreira

1 note & 34 highlights

Jukka Aakula

1 note & 1 highlight

Vikrant

57 notes & 57 highlights

Maru Kun

2 notes & 326 highlights

Blamp Head

Mark Gerstein

Vladyslav

Jamie Smith

Anna Beltramini

Ahmed El-adawy

V R

David

Stewart Morris

Rob Sedgwick

Anirvan Roy

Lsharathkumar

Eric Eskin

Hasan Murat Akinci

Harald G.

Kindle Notes & Highlights

by Piers

See all Piers’s Notes & Highlights

The Art of Statistics: Learning from Data

by David Spiegelhalter

Read between November 23 - December 9, 2020

18%

Alberto Cairo has identified four common features of a good data visualization: It contains reliable information. The design has been chosen so that relevant patterns become noticeable. It is presented in an attractive manner, but appearance should not get in the way of honesty, clarity and depth. When appropriate, it is organized in a way that enables some exploration.

18%

We just want to tell it how it is, or at least how it seems to be, and while we cannot ever claim to tell the absolute truth, we can at least try to be as truthful as possible.

Matt liked this

18%

The first rule of communication is to shut up and listen, so that you can get to know about the audience for your communication, whether it might be politicians, professionals or the general public. We have to understand their inevitable limitations and any misunderstandings, and fight the temptation to be too sophisticated and clever, or put in too much detail.

Matt liked this

21%

George Gallup, who essentially invented the idea of the opinion poll in the 1930s, came up with a fine analogy for the value of random sampling. He said that if you have cooked a large pan of soup, you do not need to eat it all to find out if it needs more seasoning. You can just taste a spoonful, provided you have given it a good stir.

30%

This section contains a simple lesson: just because we act, and something changes, it doesn’t mean we were responsible for the result. Humans seem to find this simple truth difficult to grasp – we are always keen to construct an explanatory narrative, and even keener if we are at its centre.

32%

The British statistician George Box has become famous for his brief but invaluable aphorism: ‘All models are wrong, some are useful.’

Matt liked this

67%

The Reproducibility Project found that replication effects were on average in the same direction as the original studies, but were around half their magnitude. This points to an important bias in the scientific literature: a study which has found something ‘big’, at least some of which is likely to have been luck, is likely to lead to a prominent publication. In an analogy to regression to the mean, this might be termed ‘regression to the null’, where early exaggerated estimates of effects later decrease in magnitude towards the null hypothesis.

67%

The claimed reproducibility crisis is a complex issue, rooted in the excessive pressure put on researchers to make ‘discoveries’ and publish their results in prestigious scientific journals, all of which is crucially dependent on finding statistically significant results. No single institution or profession is to blame. We have also showed when discussing hypothesis testing that, even if statistical practice were perfect, the rarity of true and substantial effects means a substantial proportion of results that are claimed to be ‘significant’ are inevitably going to be false-positives (Figure ...more

69%

This leaves valuable data sitting in the ‘file drawer’, and creates a positive bias to what appears in the literature. We do not know what we are not being told. This positive bias is made worse by ‘discoveries’ that are more likely to be accepted for publication in more prominent journals, an unwillingness to publish replications, and of course all the questionable research practices that we have seen can lead to exaggerated statistical significance.

74%

Statistical methods should enable data to answer scientific questions: Ask ‘why am I doing this?’, rather than focusing on which particular technique to use. Signals always come with noise: It is trying to separate out the two that makes the subject interesting. Variability is inevitable, and probability models are useful as an abstraction. Plan ahead, really ahead: This includes the idea of pre-specification in confirmatory experiments – avoiding researcher degrees of freedom Worry about data quality: Everything rests on the data. Statistical analysis is more than a set of computations: Do ...more

See a Problem?

Preview — The Art of Statistics by David Spiegelhalter