Naked Statistics: Stripping the Dread from the Data
Rate it:
Open Preview
Read between October 14, 2018 - February 16, 2019
38%
Flag icon
Most statistics books assume that you are using good data, just as a cookbook assumes that you are not buying rancid meat and rotten vegetables.
39%
Flag icon
some of the most powerful tools that statistics has to offer. (2) Getting a good sample is harder than it looks. (3) Many of the most egregious statistical assertions are caused by good statistical methods applied to bad samples, not the opposite. (4) Size matters, and bigger is better.
39%
Flag icon
One recurring research challenge with human subjects is creating treatment and control groups that differ only in that one group is getting the treatment and the other is not.
39%
Flag icon
The third reason we collect data is, to quote my teenage daughter, “Just because.” We sometimes have no specific idea what we will do with the information—but we suspect it will come in handy at some point. This is similar to a crime scene detective who demands that all possible evidence be captured so that it can be sorted later for clues. Some of this evidence will prove useful, some will not. If we knew exactly what would be useful, we probably would not need to be doing the investigation in the first place.
40%
Flag icon
A longitudinal study collects information on a large group of subjects at many different points in time, such as once every two years. The same participants may be interviewed
40%
Flag icon
Perry Preschool Study began in the late 1960s with a group of 123 African American three- and four-year-olds from poor families.
40%
Flag icon
They had higher earnings at age forty. In contrast, the participants who did not receive the preschool program were significantly more likely to have been arrested five or more times by age forty.
40%
Flag icon
In fact, all of this exciting cross-sectional data talk reminds me of the week before my wedding,
40%
Flag icon
“blue-green algae,”
40%
Flag icon
Eventually the pathogen was identified as a water-borne form of cyanobacteria. (These bacteria are blue, and they are the only kind of bacteria that get their energy from photosynthesis;
40%
Flag icon
I would argue that some of the most egregious statistical mistakes involve lying with data; the statistical analysis is fine, but the data on which the calculations are performed are bogus or inappropriate. Here are some common examples of “garbage in, garbage out.”
41%
Flag icon
different from people who would prefer not to be bothered. If you ask 100 people in a public place to complete a short survey, and 60 are willing to answer your questions, those 60 are likely to be different in significant ways from the 40 who walked by without making eye contact.
41%
Flag icon
the notorious Literary Digest poll of 1936,
41%
Flag icon
The Literary Digest sample was “garbage in”: the magazine’s subscribers were wealthier than average Americans, and therefore more likely to vote Republican,
41%
Flag icon
The purpose of the study was merely to document the degree of sexual side effects across all types of treatment.
41%
Flag icon
These former inmates may have changed their lives because the program helped them kick drugs. Or they may have changed their lives because of other factors that also happened to make them more likely to volunteer for a drug treatment program (such as having a really strong desire not to go back to prison).
41%
Flag icon
The net effect is to distort the research that we see, or do not see. Suppose that one of your graduate school classmates has conducted a different longitudinal study. She finds that people who spend a lot of time playing video games do have a lower incidence of colon cancer.
42%
Flag icon
Here is the problem: The 99 studies that find no link between video games and colon cancer will not get published, because they are not very interesting.
42%
Flag icon
Recall bias. Memory is a fascinating thing—though not always a great source of good data. We have a natural human impulse to understand the present as a logical consequence of things that happened in the past—cause and effect. The problem is that our memories turn out to be “systematically fragile” when we are trying to explain some particularly good or bad outcome in the present.
42%
Flag icon
say “b.” I smell survivorship bias,
43%
Flag icon
Healthy user bias. People who take vitamins regularly are likely to be healthy—because they are the kind of people who take vitamins regularly! Whether the vitamins have any impact is a separate issue.
43%
Flag icon
Of course, the purple pajamas do not matter; but having the kind of parents who put their children in purple pajamas does matter.
43%
Flag icon
As New York Times health writer Gary Taubes explains, “At its simplest, the problem is that people who faithfully engage in activities that are good for them—taking a drug as prescribed, for instance, or eating what they believe is a healthy diet—are fundamentally different from those who don’t.”
43%
Flag icon
Much of it comes from the central limit theorem, which is the Lebron James of statistics—if Lebron were also a supermodel, a Harvard professor, and the winner of the Nobel Peace Prize.
44%
Flag icon
The core principle underlying the central limit theorem is that a large, properly drawn sample will resemble the population from which it is drawn.
44%
Flag icon
1. If we have detailed information about some population, then we can make powerful inferences about any properly drawn sample from that population.
44%
Flag icon
2. If we have detailed information about a properly drawn sample (mean and standard deviation), we can make strikingly accurate inferences about the population from which that sample was drawn.
45%
Flag icon
Let us return to our (increasingly absurd) bus example. We now know that a marathon is going on in the city, as well as the International Festival of Sausage. Assume that both groups have thousands of participants,
45%
Flag icon
This kind of analysis all stems from the central limit theorem, which, from a statistical standpoint, has Lebron James–like power and elegance. According to the central limit theorem, the sample means for any population will be distributed roughly as a normal distribution around the population mean.
46%
Flag icon
The larger the sample size and the more samples taken, the more closely the distribution of sample means will approximate the normal curve.
46%
Flag icon
1. The standard deviation measures dispersion in the underlying population. In this case, it might measure the dispersion of the weights of all the participants in the Framingham Heart Study, or the dispersion around the mean for the entire marathon field.
46%
Flag icon
3. Here is what ties the two concepts together: The standard error is the standard deviation of the sample means! Isn’t that kind of cool?
47%
Flag icon
SE where s is the standard deviation of the population from which the sample is drawn, and n is the size of the sample. Keep your head about you! Don’t let the appearance of letters mess up the basic intuition. The standard error will be large when the standard deviation of the underlying distribution is large. A large sample drawn from a highly dispersed population is also likely to be highly dispersed;
47%
Flag icon
This is why the standard deviation (s) is in the numerator. Similarly, we
47%
Flag icon
(The reason we take the square root of n will be left for a more advanced text; the basic relationship is what’s important here.)
48%
Flag icon
I’ve tried to stick with the basics in this chapter. You should note that for the central limit theorem to apply, the sample sizes need to be relatively large (over 30 as a rule of thumb).
49%
Flag icon
statistics can answer these kinds of questions unequivocally; instead, inference tells us what is likely, and what is unlikely. Researchers cannot prove that a new drug is effective in treating heart disease, even when they have data from a carefully controlled clinical trial. After all, it is entirely possible that there will be random variation in the outcomes of patients in the treatment and control groups that are unrelated to the new drug. If 53 out of 100 patients taking the new
49%
Flag icon
It is still possible that this impressive result is unrelated to the new drug; the patients in the treatment group may be particularly lucky or resilient. But that is now a much less likely explanation. In the formal language of statistical inference,
49%
Flag icon
Statistical inference is the process by which the data speak to us, enabling us to draw meaningful conclusions.
49%
Flag icon
The point of statistics is not to do myriad rigorous mathematical calculations; the point is to gain insight into meaningful social phenomena. Statistical inference is really just the marriage of two concepts that we’ve already discussed:
49%
Flag icon
statistical inference is hypothesis testing. Actually, I’ve already introduced this concept—just without the fancy terminology. As noted above, statistics alone cannot prove anything; instead, we use statistical inference to accept or reject explanations on the basis of their relative likelihood.
49%
Flag icon
starting assumption, or null hypothesis, is that the defendant is innocent. The job of the prosecution is to persuade the judge or jury to reject that assumption and accept the alternative hypothesis, which is that the defendant is guilty. As
49%
Flag icon
Null hypothesis: This new experimental drug is no more effective at preventing malaria than a placebo. Alternative hypothesis: This new experimental drug can help to prevent malaria.
49%
Flag icon
This methodological approach is strange enough that we should do one more example. Again, note that the null hypothesis and alternative hypothesis are logical complements. If one is true, the other is not true. Or, if we reject one statement, we must accept the other.
50%
Flag icon
In a courtroom, the threshold for rejecting the presumption of innocence is the qualitative assessment that the defendant is “guilty beyond a reasonable doubt.” The judge or jury is left to define what exactly that means.
50%
Flag icon
The most egregious cheating involved a group of teachers who held a weekend pizza party during which they went through exam sheets and changed students’ answers.
50%
Flag icon
hypothesis is 5 percent, which is often written in decimal form: .05. This probability is known as a significance
50%
Flag icon
With those data, we can calculate the standard error for the sample mean: At mission control, the
51%
Flag icon
As the distribution above shows, we would expect roughly 95 percent of all 60-person samples drawn from the Changing Lives participants to have a mean weight within two standard errors of the population mean, or roughly between 153 pounds and 171 pounds.* Conversely, only 5 times out of 100 would a sample of 60 persons randomly drawn from the Changing Lives participants have a mean weight that is greater than 171 pounds or less than 153 pounds.
51%
Flag icon
The p-value is the specific probability of getting a result at least as extreme as the one you’ve observed if the null hypothesis is true.