Roozbeh Daneshvar’s Kindle Notes & Highlights for Naked Statistics: Stripping the Dread from the Data

Naked Statistics: Stripping the Dread from the Data

Rate it:

More on this book

Community

Jason

Jason

1 note & 12 highlights

مساعد الشطي

مساعد الشطي

1 note & 41 highlights

Lezio Finck

Lezio Finck

Samir Cardoso

Samir Cardoso

Brett Wedlund

Brett Wedlund

Tristan Privott

Tristan Privott

Colin Roberts

Colin Roberts

Bhavdeep Sethi

Bhavdeep Sethi

Ozgur

Ozgur

Bennett

Bennett

Paul Brown

Paul Brown

Jaime

Jaime

Amanda

Amanda

Arpit Agrawal

Arpit Agrawal

Muneel Zaidi

Muneel Zaidi

Kevin Wainwright

Kevin Wainwright

David E

David E

Matt

Matt

Damian Skotzke

Damian Skotzke

Rachel Swisher Ray

Rachel Swisher Ray

Cary Adkinson

Cary Adkinson

Andi

Andi

Logan Yu

Logan Yu

Daniel

Daniel

Albert Bancroft

Albert Bancroft

Rick Marriner

Rick Marriner

Nico Párraga

Nico Párraga

Muhammad Hasby

Muhammad Hasby

Nikos

Nikos

Kindle Notes & Highlights

Roozbeh Daneshvar

by Roozbeh Daneshvar

See all Roozbeh’s Notes & Highlights

Naked Statistics: Stripping the Dread from the Data

by Charles Wheelan

Read between December 7, 2021 - March 30, 2022

3%

the observation first made by Swedish mathematician and writer Andrejs Dunkels: It’s easy to lie with statistics, but it’s hard to tell the truth without them.

5%

Descriptive statistics exist to simplify, which always implies some loss of nuance or detail.

9%

The irony is that more data can often present less clarity.

Any index is highly sensitive to the descriptive statistics that are cobbled together to build it, and to the weight given to each of those components.

Precision reflects the exactitude with which we can express something.

Accuracy is a measure of whether a figure is broadly consistent with the truth—hence the danger of confusing precision with accuracy. If an answer is accurate, then more precision is usually better. But no amount of precision can make up for inaccuracy.

Statistics measure the outcomes that matter; incentives give us a reason to improve those outcomes.

The second attractive feature of the correlation coefficient is that it has no units attached to it. We can calculate the correlation between height and weight—even though height is measured in inches and weight is measured in pounds.

Probabilities do not tell us what will happen for sure; they tell us what is likely to happen and what is less likely to happen.

When it comes to risk, our fears do not always track with what the numbers tell us we should be afraid of.

thousands of Americans may have died since the September 11 attacks because they were afraid to fly.

When more Americans opted to drive rather than to fly after 9/11, there were an estimated 344 additional traffic deaths per month in October, November, and December of 2001 (taking into account the average number of fatalities and other factors that typically contribute to road accidents, such as weather).

the September 11 attacks may have caused more than 2,000 driving deaths.

(More than 99 percent of all DNA is identical among all humans.)

the law of large numbers tells us that as the number of independent trials increases, the average of the outcomes will get closer and closer to its expected value.

When you insure anything, you are contracting to receive some specified payoff in the event of a clearly defined contingency.

The broader lesson—and one of the core lessons of personal finance—is that you should always insure yourself against any adverse contingency that you cannot comfortably afford to withstand. You should skip buying insurance on everything else.

Probability tells us that any outlier—an observation that is particularly far from the mean in one direction or the other—is likely to be followed by outcomes that are more consistent with the long-term average.

our ability to analyze data has grown far more sophisticated than our thinking about what we ought to do with the results.

a properly drawn sample will look like the population from which it is drawn.

Many of the most egregious statistical assertions are caused by good statistical methods applied to bad samples, not the opposite.

large, biased sample is arguably worse than a small, biased sample because it will give a false sense of confidence regarding the results.

the question one should always ask: How have we chosen the sample or samples that we are evaluating? If each member of the relevant population does not have an equal chance of ending up in the sample, we are going to have a problem with whatever results emerge from that sample.

As polls with good samples get larger, they get better, since the margin of error shrinks. As polls with bad samples get larger, the pile of garbage just gets bigger and smellier.

Recall bias is one reason that longitudinal studies are often preferred to cross-sectional studies. In a longitudinal study the data are collected contemporaneously. At age five, a participant can be asked about his attitudes toward school. Then, thirteen years later, we can revisit that same participant and determine whether he has dropped out of high school. In a cross-sectional study, in which all the data are collected at one point in time, we must ask an eighteen-year-old high school dropout how he or she felt about school at age five, which is inherently less reliable.

(If you have a room of people with varying heights, forcing the short people to leave will raise the average height in the room, but it doesn’t make anyone taller.)

The core principle underlying the central limit theorem is that a large, properly drawn sample will resemble the population from which it is drawn.

1. If you draw large, random samples from any population, the means of those samples will be distributed normally around the population mean (regardless of what the distribution of the underlying population looks like). 2. Most sample means will lie reasonably close to the population mean; the standard error is what defines “reasonably close.” 3. The central limit theorem tells us the probability that a sample mean will lie within a certain distance of the population mean. It is relatively unlikely that a sample mean will lie more than two standard errors from the population mean, and ...more

Statistics cannot prove anything with certainty. Instead, the power of statistical inference derives from observing some pattern or outcome and then using probability to determine the most likely explanation for that outcome.

Statistical inference is really just the marriage of two concepts that we’ve already discussed: data and probability (with a little help from the central limit theorem).

A Type I error involves wrongly rejecting a null hypothesis.

Bad polling results do not typically stem from bad math when calculating the standard errors. Bad polling results typically stem from a biased sample, or bad questions, or both.

we should not use explanatory variables that might be affected by the outcome that we are trying to explain, or else the results will become hopelessly tangled.

We should have reason to believe that our explanatory variables affect the dependent variable, and not the other way around.

The best researchers are the ones who can think logically about what variables ought to be included in a regression equation, what might be missing, and how the eventual results can and should be interpreted.

The beauty of randomization is that it will generally distribute the non-treatment-related variables more or less evenly between the two groups—both the characteristics that are obvious, such as sex, race, age, and education and the nonobservable characteristics that might otherwise mess up the results.

Obviously the bigger the samples, the more effective randomization will be in creating two broadly similar groups.