Steven Stowers’s Kindle Notes & Highlights for Statistics Done Wrong: The Woefully Complete Guide

Rate it:

Open Preview

More on this book

Community

Yuan

20 notes & 178 highlights

Ozgur

Andi

Rajesh

Lars

Kindle Notes & Highlights

by Steven Stowers

See all Steven’s Notes & Highlights

Statistics Done Wrong: The Woefully Complete Guide

by Alex Reinhart

Read between December 9, 2015 - July 1, 2016

The situation is so bad that even the authors of surveys of statistical knowledge lack the necessary statistical knowledge to formulate survey questions—the numbers I just quoted are misleading because the survey of medical residents included a multiple-choice question asking residents to define a p value and gave four incorrect definitions as the only options.

The p value is the probability, under the assumption that there is no true effect or no true difference, of collecting data that shows a difference equal to or more extreme than what you actually observed.

And because any medication or intervention usually has some real effect, you can always get a statistically significant result by collecting so much data that you detect extremely tiny but relatively unimportant differences.

There’s no mathematical tool to tell you whether your hypothesis is true or false; you can see only whether it’s consistent with the data. If the data is sparse or unclear, your conclusions will be uncertain.

11%

Confidence intervals can answer the same questions as p values, with the advantage that they provide more information and are more straightforward to interpret.

11%

If you can write a result as a confidence interval instead of as a p value, you should.7 Confidence intervals sidestep most of the interpretational subtleties associated with p values, making the resulting research that much clearer.

18%

This effect, known as truth inflation, type M error (M for magnitude), or the winner’s curse, occurs in fields where many researchers conduct similar experiments and compete to publish the most “exciting” results:

18%

In fast-moving fields such as genetics, the earliest published results are often the most extreme because journals are most interested in publishing new and exciting results. Follow-up studies tend to show much smaller effects.

18%

Consider also that top-ranked journals, such as Nature and Science, prefer to publish studies with groundbreaking results—meaning large effect sizes in novel fields with little prior research. This is a perfect combination for chronic truth inflation.

26%

So when someone cites a low p value to say their study is probably right, remember that the probability of error is actually almost certainly higher. In areas where most tested hypotheses are false, such as early drug trials (most early drugs don’t make it through trials), it’s likely that most statistically significant results with p < 0.05 are actually flukes.

49%

For a nonmedical example, if you compare flight delays between United Airlines and Continental Airlines, you’ll find United has more flights delayed on average. But at each individual airport in the comparison, Continental’s flights are more likely to be delayed. It turns out United operates more flights out of cities with poor weather. Its average is dragged down by the airports with the most delays.

51%

if scientists try different statistical analyses until one works—say, by controlling for different combinations of variables and trying different sample sizes—false positive rates can jump to more than 50% for a given dataset.3

70%

Besides mastering their own rapidly advancing fields, most scientists are expected to be good at programming (including version control, unit testing, and good software engineering practices), designing statistical graphics, writing scientific papers, managing research groups, mentoring students, managing and archiving data, teaching, applying for grants, and peer-reviewing other scientists’ work, along with the statistical skills I’m demanding here. People dedicate their entire careers to mastering one of these skills, yet we expect scientists to be good at all of them to be competitive.

70%

(Many statisticians are susceptible to nerd sniping. Describe an interesting problem to them, and they will be unable to resist an attempt at solving it.)

70%

A strong course in applied statistics should cover basic hypothesis testing, regression, statistical power calculation, model selection, and a statistical programming language like R. Or at the least, the course should mention that these concepts exist—perhaps a full mathematical explanation of statistical power won’t fit in the curriculum, but students should be aware of power and should know to ask for power calculations when they need them. Sadly, whenever I read the syllabus for an applied statistics course, I notice it fails to cover all of these topics. Many textbooks cover them only ...more

71%

When you find common errors in the scientific literature—such as a simple misinterpretation of p values—hit the perpetrator over the head with your statistics textbook. It’s therapeutic.

See a Problem?

Preview — Statistics Done Wrong by Alex Reinhart