Yuan’s Kindle Notes & Highlights for Statistics Done Wrong: The Woefully Complete Guide

Using a version control system means you have a record of every change, so you’re never stuck wondering, “How could this code have worked last Tuesday but not now?”

56%

I was once very embarrassed when I had to reformat figures in a paper for publication, only to realize that I didn’t remember what data I had used to make them. My messy analysis cost m...

This highlight has been truncated due to consecutive passage length restrictions.

56%

But even if they have fully automated their analysis, scientists are understandably relu...

This highlight has been truncated due to consecutive passage length restrictions.

56%

“agree to hold the Author free from shame, embarrassment, or ridicule for any hacks, kludges, or leaps of faith found within the Program.” While the CRAPL may not be the most legally rigorous licensing agreement, it speaks to the problems faced by authors of academic code: writing software for public use takes a great deal more work than writing code for personal use, including documentation, testing, and cleanup of accumulated cruft from many nights of hacking.

57%

And would scientists avail themselves of the opportunity to inspect code and find bugs? Nobody gets scientific glory by checking code for typos.

57%

Automate your data analysis using a spreadsheet, analysis script, or program that can be tested against known input. If anyone suspects an error, you should be able to refer to your code to see exactly what you did.

57%

Corollary: Test all analysis programs against known input and ensure the results make sense. Ideally, use automated tests to check the code as you make changes, ensuring you don’t introduce errors.

Verification

57%

Make all data available when possible, through specialized databases such as GenBank and PDB or through generic data repositories such as Dryad and Figshare.

57%

Publish your software source code, spreadsheets, or analysis scripts. Many journals let you submit these as supplementary material with your paper, or you can deposit the files on Dryad or Figshare.

61%

However, if boring results never appear in peer-reviewed publications or are shown in insufficient detail to be of use, the Cochrane researchers will never include them in reviews, causing what is known as outcome reporting bias, where systematic reviews become biased toward more extreme and more interesting results.

61%

Reviewers sometimes did not realize outcome reporting bias was present, instead assuming the outcome simply hadn’t been measured.

61%

Medical journals have begun to combat this problem by coming up with standards, such as the CONSORT checklist, which requires reporting of statistical methods, all measured outcomes, and any changes to the trial design after it began.

62%

Earlier you saw the impact of multiple comparisons and truth inflation on study results. These problems arise when studies make numerous comparisons with low statistical power, giving a high rate of false positives and inflated estimates of effect sizes, and they appear everywhere in published research.

63%

This problem is commonly known as publication bias, or the file drawer problem.

63%

Suppose, for example, that the effect size is 0.8 (on some arbitrary scale), but the review was composed of many small studies that each had a power of 0.2.

63%

Knowing his findings would not be readily believed, Bem published not one but 10 different experiments in the same study, with 9 showing statistically significant psychic powers.

64%

A debate still rages in the psychological literature over the impact of publication bias on publications about publication bias.

64%

certain kinds of clinical trials to be registered through its website ClinicalTrials.gov before the trials begin, and it requires the publication of summary results on the ClinicalTrials.gov website within a year of the end of the trial. To help enforce registration, the International Committee of Medical Journal Editors announced in 2005 that it would not publish studies that had not been preregistered.

64%

Compliance has been poor. A random sample of all clinical trials registered from June 2008 to June 2009 revealed that more than 40% of protocols were registered after the first study participants have been recruited, with the median delinquent study registered 10 months late.

65%

Since most research articles have poor statistical power and researchers have freedom to choose among analysis methods to get favorable results, while most tested hypotheses are false and most true hypotheses correspond to very small effects, we are mathematically determined to get a plethora of false positives.

66%

The astute reader may wonder whether these figures are influenced by publication bias. Perhaps the New England Journal of Medicine is biased toward publishing rejections of current standards since they are more exciting.

Publication bias

67%

And we can’t expect them to ask the right questions in class, because they don’t realize they don’t understand.

67%

Again, methods from physics education research provide the answer. If lectures do not force students to confront and correct their misconceptions, we will have to use a method that does. A leading example is peer instruction. Students are assigned readings or videos before class, and class time is spent reviewing the basic concepts and answering conceptual questions. Forced to choose an answer and discuss why they believe it is true before the instructor reveals the correct answer, students immediately see when their misconceptions do not match reality, and instructors spot problems before ...more

Aha

69%

methods. Promotions, tenure, raises, and job offers are all dependent on having a long list of publications in prestigious journals, so there is a strong incentive to publish promising results as soon as possible. Tenure and hiring committees, composed of overworked academics pushing out their own research papers, cannot extensively review each publication for quality or originality, relying instead on prestige and quantity as approximations.

69%

Journal editors attempt to judge which papers will have the greatest impact and interest and consequently choose those with the most surprising, controversial, or novel results. As you’ve seen, this is a recipe for truth inflation, as well as outcome reporting and publication biases, and strongly discourages replication studies and negative results.

69%

But PLOS ONE is sometimes seen as a dumping ground for papers that couldn’t cut it at more prestigious journals, and some scientists fear publishing in it will worry potential employers.

70%

A strong course in applied statistics should cover basic hypothesis testing, regression, statistical power calculation, model selection, and a statistical programming language like

70%

The statistical power of the study or any other means by which the appropriate sample size was determined How variables were selected or discarded for analysis Whether the statistical results presented support the paper’s conclusions Effect-size estimates and confidence intervals accompanying significance tests, showing whether the results have practical importance Whether appropriate statistical tests were used and, if necessary, how they were corrected for multiple comparisons Details of any stopping rules

See a Problem?

Preview — Statistics Done Wrong by Alex Reinhart