More on this book
Community
Kindle Notes & Highlights
Read between
July 23 - July 27, 2021
to turn experience into data, we have to start with rigorous definitions.
Petra X and 2 other people liked this
Data has two main limitations as a source of such knowledge. First, it is almost always an imperfect measure of what we are really interested in: asking how happy people were last week on a scale from zero to ten hardly encapsulates the emotional wellbeing of the nation. Second, anything we choose to measure will differ from place to place, from person to person, from time to time, and the problem is to extract meaningful insights from all this apparently random variability.
People have always counted and measured, but modern statistics as a discipline really began in the 1650s when, as we shall see in Chapter 8, probability was properly understood for the first time by Blaise Pascal and Pierre de Fermat.
Statistical training is increasingly seen as just one necessary component of being a data scientist, together with skills in data management, programming and algorithm development, as well as proper knowledge of the subject matter.
Far from freeing us from the need for statistical skills, bigger data and the rise in the number and complexity of scientific studies makes it even more difficult to draw appropriate conclusions. More data means that we need to be even more aware of what the evidence is actually worth.
intensive analysis of data sets derived from routine data can increase the possibility of false discoveries, both due to systematic bias inherent in the data sources and from carrying out many analyses and only reporting whatever looks most interesting,
In order to be able to critique published scientific work, and even more the media reports which we all encounter on a daily basis, we should have an acute awareness of the dangers of selective reporting, the need for scientific claims to be replicated by independent resear...
This highlight has been truncated due to consecutive passage length restrictions.
data literacy, which describes the ability to not only carry out statistical analysis on real-world problems, but also to understand and critique any conclusions...
This highlight has been truncated due to consecutive passage length restrictions.
two tricks used to manipulate the impact of this statistic: convert from a positive to a negative frame, and then turn a percentage into actual numbers of people.
numbers do not speak for themselves; the context, language and graphic design all contribute to the way the communication is received.
In real life deduction is the process of using the rules of cold logic to work from general premises to particular conclusions.
induction works the other way, in taking particular instances and trying to work out general conclusions.
a defendant can be found guilty, but nobody is ever found innocent, simply not proven to be guilty. Similarly we shall find that we may reject the null hypothesis, but if we don’t have sufficient evidence to do so, it does not mean that we can accept it as truth. It is just a working assumption until something better comes along.
For new pharmaceuticals to be approved by the US Food and Drug Administration, it has become standard that two independent clinical trials must have been carried out, each showing clinical benefit that is significant at P < 0.05. This means that the overall chance of approving a drug, that in truth has no benefit at all, is 0.05 × 0.05 = 0.0025, or 1 in 400.