Statistics Done Wrong: The Woefully Complete Guide
Rate it:
Open Preview
Kindle Notes & Highlights
Read between April 2, 2019 - January 26, 2020
3%
Flag icon
The first principle is that you must not fool yourself, and you are the easiest person to fool.
4%
Flag icon
This already happens on most websites that discuss science news, and it would annoy me endlessly to see this book used to justify it. The first comments on a news article are always complaints about how “they didn’t control for this variable” and “the sample size is too small,” and 9 times out of 10, the commenter never read the scientific paper to notice that their complaint was addressed in the third paragraph.
4%
Flag icon
A research paper’s statistical methods can be judged only in detail and in context with the rest of its methods: study design, measurement techniques, cost constraints, and goals.
4%
Flag icon
Use your statistical knowledge to better understand the strengths, limitations, and potential biases of research, not to shoot down any paper that seems to misuse a p value or contradict your personal beliefs.
4%
Flag icon
Also, remember that a conclusion supported by poor statistics can still be correct—statistical and logical errors do not make a conc...
This highlight has been truncated due to consecutive passage length restrictions.
4%
Flag icon
In short, please practice statistics...
This highlight has been truncated due to consecutive passage length restrictions.
5%
Flag icon
But few people complain about statistics done by trained scientists. Scientists seek understanding, not ammunition to use against political opponents.
5%
Flag icon
t tests, p values, proportional hazards models, propensity scores, logistic regressions, least-squares fits, and confidence intervals.
5%
Flag icon
Review articles and editorials appear regularly in leading journals, demanding higher statistical standards and tougher review, but few scientists hear their pleas, and journal-mandated standards are often ignored.
6%
Flag icon
“We are fast becoming a nuisance to society. People don’t take us seriously anymore, and when they do take us seriously, we may unintentionally do more harm than good.”
7%
Flag icon
While Hanlon’s razor directs us to “never attribute to malice that which is adequately explained by incompetence,”
7%
Flag icon
The pharmaceutical industry seems particularly tempted to bias evidence by neglecting to publish studies that show their drugs do not work;[
Yuan
Publication bias
7%
Flag icon
“torture the data until it confesses.”
7%
Flag icon
Readers interested in the pharmaceutical industry’s statistical misadventures may enjoy Ben Goldacre’s Bad Pharma (Faber & Faber, 2012), which caused a statistically significant increase in my blood pressure while I read it.
8%
Flag icon
We use statistics to make judgments about these kinds of differences. We will always observe some difference due to luck and random variation, so statisticians talk about statistically significant differences when the difference is larger than could easily be produced by luck. So first we must learn how to make that decision.
8%
Flag icon
But not all colds are identical. Maybe the average cold lasts a week, but some last only a few days.
Yuan
Experiments emphasize internal validity and indogeneties
8%
Flag icon
“Even if my medication were completely ineffective, what are the chances my experiment would have produced the observed outcome?” If
8%
Flag icon
The p value is the probability, under the assumption that there is no true effect or no true difference, of collecting data that shows a difference equal to or more extreme than what you actually observed.
8%
Flag icon
Remember, a p value is not a measure of how right you are or how important a difference is. Instead, think of it as a measure of surprise.
8%
Flag icon
The choice of 0.05 isn’t because of any special logical or statistical reasons, but it has become scientific convention through decades of common use.
9%
Flag icon
Remember, p is a measure of surprise,
9%
Flag icon
p is
9%
Flag icon
a measure of s...
This highlight has been truncated due to consecutive passage length restrictions.
9%
Flag icon
It’s not a measure of the size of the effect. You can get a tiny p value by measuring a huge effect—“This medicine makes people live four times longer”—or by meas...
This highlight has been truncated due to consecutive passage length restrictions.
9%
Flag icon
because any medication or intervention usually has some real effect, you can always get a statistically significant result by collecting so much data that you detect extreme...
This highlight has been truncated due to consecutive passage length restrictions.
9%
Flag icon
In short, statistical significance does not mean your result has any practical significance.
Yuan
Might not be clinically important
9%
Flag icon
There’s no mathematical tool to tell you whether your hypothesis is true or false; you can see only whether it’s consistent with the data. If the data is sparse or unclear, your conclusions will be uncertain.
9%
Flag icon
Recall that a p value is calculated under the assumption that luck (not your medication or intervention) is the only factor in your experiment, and that p is defined as the probability of obtaining a result equal to or more extreme than the one observed.
9%
Flag icon
which makes p values “psychic”: two experiments with different designs can produce identical data but different p values because the unobserved data is different.
10%
Flag icon
The p value, when combined with an experimenter’s prior experience and domain knowledge, could be useful in deciding how to interpret new data.
10%
Flag icon
In science, it is important to limit two kinds of errors: false positives, where you conclude there is an effect when there isn’t, and false negatives, where you fail to notice a real effect.
10%
Flag icon
If we’re too ready to jump to conclusions about effects, we’re prone to get false positives; if we’re too conservative, we’ll err on the side of false negatives.
10%
Flag icon
it is possible to develop a formal decision-making process that will ensure false positives occur only at some predefined rate. They called this rate α, and their idea was for experimenters to set an α based upon their experience and expectations. So, for instance, if we’re willing to put up with a 10% rate of false positives, we’ll set α = 0.1.
10%
Flag icon
To determine which testing procedure is best, we see which has the lowest false negative rate for a given choice of α.
10%
Flag icon
We use the p value to implement the Neyman-Pearson testing procedure by rejecting the null hypothesis whenever p < α.
10%
Flag icon
Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern our behaviour with regard to them, in following which we insure that, in the long run of experience, we shall not be too often wrong.3
11%
Flag icon
A single experiment does not have a false positive rate.
11%
Flag icon
The false positive rate is determined by your procedure, not the result of any single experiment.
11%
Flag icon
Confidence intervals can answer the same questions as p values, with the advantage that they provide more information and are more straightforward to interpret.
11%
Flag icon
A confidence interval combines a point estimate with the uncertainty in that estimate.
11%
Flag icon
A confidence interval quantifies the uncertainty in your conclusions, providing vastly more information than a p value, which says nothing about effect sizes.
11%
Flag icon
If the symptom is already pretty innocuous, maybe a 15–25% improvement isn’t too important. Then again, for a symptom like spontaneous human combustion, you might get excited about any improvement.
11%
Flag icon
If you can write a result as a confidence interval instead of as a p value, you should.7
12%
Flag icon
One possible explanation is that confidence intervals go unreported because they are often embarrassingly wide.11
12%
Flag icon
During Rothman’s three-year tenure as associate editor, the fraction of papers reporting solely p values dropped precipitously. Significance tests returned after his departure, although subsequent editors successfully encouraged researchers to report confidence intervals as well.
12%
Flag icon
You’ve seen how it’s possible to miss real effects by not collecting enough data. You might miss a viable medicine or fail to notice an important side effect. So how do you know how much data to collect?
12%
Flag icon
The concept of statistical power provides the answer. The power of a study is the probability that it will distinguish an effect of a certain size from pure luck. A study might easily detect a huge benefit from a medication, but detecting a subtle difference is much less likely.
13%
Flag icon
math. Say I run 100 trials and count the number of heads. If the result isn’t exactly 50 heads, I’ll calculate the probability that a fair coin would have turned up a deviation of that size or larger. That probability is my p value.
Yuan
P value
13%
Flag icon
A power curve, as shown in Figure 2-2, can tell me.
13%
Flag icon
The power for any hypothesis test is the probability that it will yield a statistically significant outcome (defined in this example as p < 0.05).
« Prev 1 3 4