Naked Statistics: Stripping the Dread from the Data
Rate it:
Open Preview
Read between December 7, 2021 - March 30, 2022
3%
Flag icon
the observation first made by Swedish mathematician and writer Andrejs Dunkels: It’s easy to lie with statistics, but it’s hard to tell the truth without them.
5%
Flag icon
Descriptive statistics exist to simplify, which always implies some loss of nuance or detail.
9%
Flag icon
The irony is that more data can often present less clarity.
13%
Flag icon
Any index is highly sensitive to the descriptive statistics that are cobbled together to build it, and to the weight given to each of those components.
14%
Flag icon
Precision reflects the exactitude with which we can express something.
14%
Flag icon
Accuracy is a measure of whether a figure is broadly consistent with the truth—hence the danger of confusing precision with accuracy. If an answer is accurate, then more precision is usually better. But no amount of precision can make up for inaccuracy.
20%
Flag icon
Statistics measure the outcomes that matter; incentives give us a reason to improve those outcomes.
22%
Flag icon
The second attractive feature of the correlation coefficient is that it has no units attached to it. We can calculate the correlation between height and weight—even though height is measured in inches and weight is measured in pounds.
26%
Flag icon
Probabilities do not tell us what will happen for sure; they tell us what is likely to happen and what is less likely to happen.
26%
Flag icon
When it comes to risk, our fears do not always track with what the numbers tell us we should be afraid of.
26%
Flag icon
thousands of Americans may have died since the September 11 attacks because they were afraid to fly.
26%
Flag icon
When more Americans opted to drive rather than to fly after 9/11, there were an estimated 344 additional traffic deaths per month in October, November, and December of 2001 (taking into account the average number of fatalities and other factors that typically contribute to road accidents, such as weather).
26%
Flag icon
the September 11 attacks may have caused more than 2,000 driving deaths.
26%
Flag icon
(More than 99 percent of all DNA is identical among all humans.)
28%
Flag icon
the law of large numbers tells us that as the number of independent trials increases, the average of the outcomes will get closer and closer to its expected value.
28%
Flag icon
When you insure anything, you are contracting to receive some specified payoff in the event of a clearly defined contingency.
29%
Flag icon
The broader lesson—and one of the core lessons of personal finance—is that you should always insure yourself against any adverse contingency that you cannot comfortably afford to withstand. You should skip buying insurance on everything else.
36%
Flag icon
Probability tells us that any outlier—an observation that is particularly far from the mean in one direction or the other—is likely to be followed by outcomes that are more consistent with the long-term average.
38%
Flag icon
our ability to analyze data has grown far more sophisticated than our thinking about what we ought to do with the results.
39%
Flag icon
a properly drawn sample will look like the population from which it is drawn.
39%
Flag icon
Many of the most egregious statistical assertions are caused by good statistical methods applied to bad samples, not the opposite.
39%
Flag icon
large, biased sample is arguably worse than a small, biased sample because it will give a false sense of confidence regarding the results.
40%
Flag icon
the question one should always ask: How have we chosen the sample or samples that we are evaluating? If each member of the relevant population does not have an equal chance of ending up in the sample, we are going to have a problem with whatever results emerge from that sample.
41%
Flag icon
As polls with good samples get larger, they get better, since the margin of error shrinks. As polls with bad samples get larger, the pile of garbage just gets bigger and smellier.
42%
Flag icon
Recall bias is one reason that longitudinal studies are often preferred to cross-sectional studies. In a longitudinal study the data are collected contemporaneously. At age five, a participant can be asked about his attitudes toward school. Then, thirteen years later, we can revisit that same participant and determine whether he has dropped out of high school. In a cross-sectional study, in which all the data are collected at one point in time, we must ask an eighteen-year-old high school dropout how he or she felt about school at age five, which is inherently less reliable.
42%
Flag icon
(If you have a room of people with varying heights, forcing the short people to leave will raise the average height in the room, but it doesn’t make anyone taller.)
44%
Flag icon
The core principle underlying the central limit theorem is that a large, properly drawn sample will resemble the population from which it is drawn.
48%
Flag icon
1. If you draw large, random samples from any population, the means of those samples will be distributed normally around the population mean (regardless of what the distribution of the underlying population looks like). 2. Most sample means will lie reasonably close to the population mean; the standard error is what defines “reasonably close.” 3. The central limit theorem tells us the probability that a sample mean will lie within a certain distance of the population mean. It is relatively unlikely that a sample mean will lie more than two standard errors from the population mean, and ...more
48%
Flag icon
Statistics cannot prove anything with certainty. Instead, the power of statistical inference derives from observing some pattern or outcome and then using probability to determine the most likely explanation for that outcome.
49%
Flag icon
Statistical inference is really just the marriage of two concepts that we’ve already discussed: data and probability (with a little help from the central limit theorem).
54%
Flag icon
A Type I error involves wrongly rejecting a null hypothesis.
59%
Flag icon
Bad polling results do not typically stem from bad math when calculating the standard errors. Bad polling results typically stem from a biased sample, or bad questions, or both.
72%
Flag icon
we should not use explanatory variables that might be affected by the outcome that we are trying to explain, or else the results will become hopelessly tangled.
72%
Flag icon
We should have reason to believe that our explanatory variables affect the dependent variable, and not the other way around.
74%
Flag icon
The best researchers are the ones who can think logically about what variables ought to be included in a regression equation, what might be missing, and how the eventual results can and should be interpreted.
75%
Flag icon
The beauty of randomization is that it will generally distribute the non-treatment-related variables more or less evenly between the two groups—both the characteristics that are obvious, such as sex, race, age, and education and the nonobservable characteristics that might otherwise mess up the results.
76%
Flag icon
Obviously the bigger the samples, the more effective randomization will be in creating two broadly similar groups.