Statistics Done Wrong: The Woefully Complete Guide
Rate it:
Open Preview
Kindle Notes & Highlights
Read between April 2, 2019 - January 26, 2020
32%
Flag icon
Learn to use prior estimates of the base rate to calculate the probability that a given result is a false positive (as in the mammogram example).
33%
Flag icon
However, a difference in significance does not always make a significant difference.1
34%
Flag icon
This doesn’t improve our statistical power, but it does prevent the false conclusion that the drugs are different.
36%
Flag icon
high standard deviation would tell me it benefits some patients much more than others. Confidence intervals and standard errors, on the other hand, estimate how far the average for this sample might be from the true average—the
36%
Flag icon
Hence, it is important to know whether an error bar represents a standard deviation, confidence interval, or standard error, though papers often do not say.[14
36%
Flag icon
Plugging in the numbers for Fixitol and Solvix, I find that p < 0.05! There is a statistically significant difference between them, even though the confidence intervals overlap.
36%
Flag icon
Unfortunately, many scientists skip the math and simply glance at plots to see whether confidence intervals overlap.
36%
Flag icon
Earlier, we assumed the error bars in Figure 5-3 represent confidence intervals. But what if they are standard errors or standard deviations? Could we spot a significant difference by just looking for whether the error bars overlap?
36%
Flag icon
A survey of psychologists, neuroscientists, and medical researchers found that the majority judged significance by confidence interval overlap, with many scientists confusing standard errors, standard deviations, and confidence intervals.
36%
Flag icon
There is exactly one situation when visually checking confidence intervals works, and it is when comparing the confidence interval against a fixed value, rather than another confidence interval.
37%
Flag icon
Overlapping confidence intervals do not mean two values are not significantly different. Checking confidence intervals or standard errors will mislead.
37%
Flag icon
Your eyeball is not a well-defined statistical procedure.
37%
Flag icon
And because standard error bars are about half as wide as the 95% confidence interval,
37%
Flag icon
If in your explorations you find an interesting correlation, the standard procedure is to collect a new dataset and test the hypothesis again. Testing an independent dataset will filter out false positives and leave any legitimate discoveries standing.
37%
Flag icon
And so exploratory findings should be considered tentative until confirmed.
38%
Flag icon
Analyze the signal from every electrode to see whether it showed any reaction above the normal background firing rate, which would indicate that it is picking up signals from a neuron we’re interested in.
Yuan
Remind me of Xing' class in Hong Kong about neuroscience
39%
Flag icon
These rules are often violated in the neuroimaging literature, perhaps as much as 40% of the time, causing inflated correlations and false positives.
39%
Flag icon
Studies committing this error tend to find larger correlations between stimuli and neural activity than are plausible, given the random noise and error inherent to brain imaging.3 Similar problems occur when geneticists collect data on thousands of genes and select subsets for analysis or when epidemiologists dredge through demographics and risk factors to find which ones are associated with disease.4
41%
Flag icon
Had we chosen a fixed group size in advance, the p value would be the probability of obtaining more extreme results with that particular group size.
41%
Flag icon
But many stopped studies don’t even publish their original intended sample size or the stopping rule used to justify terminating the study.8 A trial’s early stoppage is not automatic evidence that its results are biased, but it is suggestive.
43%
Flag icon
Dichotomization eliminates this distinction, dropping useful information and statistical power.
43%
Flag icon
We are often interested in controlling for confounding factors. You might measure two or three variables (or two or three dozen) along with the outcome variable and attempt to determine the unique effect of each variable on the outcome after the other variables have been “controlled for.”
44%
Flag icon
While the mathematical theory of regression with multiple variables can be more advanced than many practicing scientists care to understand, involving a great deal of linear algebra, the basic concepts and results are easy to understand and interpret.
44%
Flag icon
Don’t arbitrarily split continuous variables into discrete groups unless you have good reason. Use a statistical procedure that can take full advantage of the continuous variables. If you do need to split continuous variables into groups for some reason, don’t choose the groups to maximize your statistical significance. Define the split in advance, use the same split as in previous similar research, or use outside standards (such as a medical definition of obesity or high blood pressure) instead.
45%
Flag icon
only a truly randomized experiment eliminates all confounding variables.
45%
Flag icon
Let’s start with the simplest problem: overfitting, which is the result of excessive enthusiasm in data analysis.
46%
Flag icon
Stepwise regression is common in many scientific fields, but it’s usually a bad idea.
46%
Flag icon
You probably already noticed one problem: multiple comparisons. Hypothetically, by adding only statistically significant variables, you avoid overfitting, but running so many significance tests is bound to produce false positives, so some of the variables you select will be bogus.
46%
Flag icon
(Alternative stepwise procedures use other criteria instead of statistical significance but suffer from many of the same problems.)
46%
Flag icon
stepwise regression is susceptible to egregious overfitting,
46%
Flag icon
It’s also possible to change the criteria used to include new variables; instead of statistical significance, more-modern procedures use metrics like the Akaike information criterion and the Bayesian information criterion, which reduce overfitting by penalizing models with more variables.
46%
Flag icon
How can a regression model be fairly evaluated, avoiding these problems? One option is cross-validation: fit the model using only a portion of the melons and then test its effectiveness at predicting the ripeness of the other melons.
47%
Flag icon
But choosing a single model is usually foolishly overconfident. With so many variables to choose from, there are often many combinations of variables that predict the outcome nearly as well.
47%
Flag icon
the lasso (short for least absolute shrinkage and selection operator, an inspired acronym) has better mathematical properties and doesn’t fool the user with claims of statistical significance. But the lasso is not bulletproof, and there is no perfect automated solution.
47%
Flag icon
Correlation and Causation
52%
Flag icon
The choices that produce interesting results will attract our attention and engage our human tendency to build plausible stories for any outcome.
52%
Flag icon
The most worrying consequence of this statistical freedom is that researchers may unintentionally choose the statistical analysis most favorable to them.
53%
Flag icon
The proliferation of statistical techniques has given us useful tools, but it seems they’ve been put to use as blunt objects with which to beat the data until it confesses.
54%
Flag icon
the constant pressure to publish means that thorough documentation and replication are ignored. There’s no incentive for researchers to make their data and calculations available for inspection or to devote time to replicating other researchers’ results.
54%
Flag icon
As these problems have become more widely known, software tools have advanced to make analysis steps easier to record and share.
Yuan
Git Rmarkdown
54%
Flag icon
But first they asked two biostatisticians, Keith Baggerly and Kevin Coombes, to check the data.
55%
Flag icon
the lead Duke researcher, Anil Potti, had falsified his résumé.
55%
Flag icon
Potti eventually resigned from Duke amid accusations of fraud.
55%
Flag icon
The Potti case illustrates two problems: the lack of reproducibility in much of modern science and the difficulty of publishing negative and contradictory results in academic journals.
55%
Flag icon
The problem was not just that Potti did not share his data readily. Scientists often do not record and document the steps they take converting raw data to results, except in the often-vague form of a scientific paper or whatever is written down in a lab notebook.
55%
Flag icon
Ideally, these steps would be reproducible: fully automated, with the computer source code available for inspection as a definitive record of the work. Errors would be easy to spot and correct, and any scientist could download the dataset and code and produce exactly the same results. Even better, the code would be combined with a description of its purpose.
55%
Flag icon
Sweave,
Yuan
Rmarkdown
55%
Flag icon
but another scientist reading the paper and curious about its methods can download the source code, which shows exactly how
56%
Flag icon
A more comprehensive strategy to ensure reproducibility and ease of error detection would follow the “Ten Simple Rules for Reproducible Computational Research,” developed by a group of biomedical researchers.9 These rules include automating data manipulation and reformatting, recording all changes to analysis software and custom programs using a software version control system, storing all raw data, and making all scripts and data available for public analysis.
56%
Flag icon
Automated data analysis makes it easy to try software on new datasets or test that each piece functions correctly.