Statistics Done Wrong: The Woefully Complete Guide
Rate it:
Open Preview
Read between December 9, 2015 - July 1, 2016
6%
Flag icon
The situation is so bad that even the authors of surveys of statistical knowledge lack the necessary statistical knowledge to formulate survey questions—the numbers I just quoted are misleading because the survey of medical residents included a multiple-choice question asking residents to define a p value and gave four incorrect definitions as the only options.
8%
Flag icon
The p value is the probability, under the assumption that there is no true effect or no true difference, of collecting data that shows a difference equal to or more extreme than what you actually observed.
9%
Flag icon
And because any medication or intervention usually has some real effect, you can always get a statistically significant result by collecting so much data that you detect extremely tiny but relatively unimportant differences.
9%
Flag icon
There’s no mathematical tool to tell you whether your hypothesis is true or false; you can see only whether it’s consistent with the data. If the data is sparse or unclear, your conclusions will be uncertain.
11%
Flag icon
Confidence intervals can answer the same questions as p values, with the advantage that they provide more information and are more straightforward to interpret.
11%
Flag icon
If you can write a result as a confidence interval instead of as a p value, you should.7 Confidence intervals sidestep most of the interpretational subtleties associated with p values, making the resulting research that much clearer.
18%
Flag icon
This effect, known as truth inflation, type M error (M for magnitude), or the winner’s curse, occurs in fields where many researchers conduct similar experiments and compete to publish the most “exciting” results:
18%
Flag icon
In fast-moving fields such as genetics, the earliest published results are often the most extreme because journals are most interested in publishing new and exciting results. Follow-up studies tend to show much smaller effects.
18%
Flag icon
Consider also that top-ranked journals, such as Nature and Science, prefer to publish studies with groundbreaking results—meaning large effect sizes in novel fields with little prior research. This is a perfect combination for chronic truth inflation.
26%
Flag icon
So when someone cites a low p value to say their study is probably right, remember that the probability of error is actually almost certainly higher. In areas where most tested hypotheses are false, such as early drug trials (most early drugs don’t make it through trials), it’s likely that most statistically significant results with p < 0.05 are actually flukes.
49%
Flag icon
For a nonmedical example, if you compare flight delays between United Airlines and Continental Airlines, you’ll find United has more flights delayed on average. But at each individual airport in the comparison, Continental’s flights are more likely to be delayed. It turns out United operates more flights out of cities with poor weather. Its average is dragged down by the airports with the most delays.
51%
Flag icon
if scientists try different statistical analyses until one works—say, by controlling for different combinations of variables and trying different sample sizes—false positive rates can jump to more than 50% for a given dataset.3
70%
Flag icon
Besides mastering their own rapidly advancing fields, most scientists are expected to be good at programming (including version control, unit testing, and good software engineering practices), designing statistical graphics, writing scientific papers, managing research groups, mentoring students, managing and archiving data, teaching, applying for grants, and peer-reviewing other scientists’ work, along with the statistical skills I’m demanding here. People dedicate their entire careers to mastering one of these skills, yet we expect scientists to be good at all of them to be competitive.
70%
Flag icon
(Many statisticians are susceptible to nerd sniping. Describe an interesting problem to them, and they will be unable to resist an attempt at solving it.)
70%
Flag icon
A strong course in applied statistics should cover basic hypothesis testing, regression, statistical power calculation, model selection, and a statistical programming language like R. Or at the least, the course should mention that these concepts exist—perhaps a full mathematical explanation of statistical power won’t fit in the curriculum, but students should be aware of power and should know to ask for power calculations when they need them. Sadly, whenever I read the syllabus for an applied statistics course, I notice it fails to cover all of these topics. Many textbooks cover them only ...more
71%
Flag icon
When you find common errors in the scientific literature—such as a simple misinterpretation of p values—hit the perpetrator over the head with your statistics textbook. It’s therapeutic.