مساعد الشطي’s Kindle Notes & Highlights for Naked Statistics: Stripping the Dread from the Data

Naked Statistics: Stripping the Dread from the Data

Rate it:

More on this book

Community

Jason

Jason

1 note & 12 highlights

Lezio Finck

Lezio Finck

Samir Cardoso

Samir Cardoso

Brett Wedlund

Brett Wedlund

Tristan Privott

Tristan Privott

Colin Roberts

Colin Roberts

Roozbeh Daneshvar

Roozbeh Daneshvar

Bhavdeep Sethi

Bhavdeep Sethi

Ozgur

Ozgur

Bennett

Bennett

Paul Brown

Paul Brown

Jaime

Jaime

Amanda

Amanda

Arpit Agrawal

Arpit Agrawal

Muneel Zaidi

Muneel Zaidi

Kevin Wainwright

Kevin Wainwright

David E

David E

Matt

Matt

Damian Skotzke

Damian Skotzke

Rachel Swisher Ray

Rachel Swisher Ray

Cary Adkinson

Cary Adkinson

Andi

Andi

Logan Yu

Logan Yu

Daniel

Daniel

Albert Bancroft

Albert Bancroft

Rick Marriner

Rick Marriner

Nico Párraga

Nico Párraga

Muhammad Hasby

Muhammad Hasby

Nikos

Nikos

Kindle Notes & Highlights

مساعد الشطي

by مساعد الشطي

See all مساعد’s Notes & Highlights

Naked Statistics: Stripping the Dread from the Data

by Charles Wheelan

Started reading November 28, 2019

1%

I particularly disliked high school calculus for the simple reason that no one ever bothered to tell me why I needed to learn it. What is the area beneath a parabola? Who cares?

Dalal A. liked this

1%

Curiously, I loved physics in high school, even though physics relies very heavily on the very same calculus that I refused to do in Mrs.

2%

Because physics has a clear purpose.

2%

I love statistics. Statistics can be used to explain everything from DNA testing to the idiocy of playing the lottery.

2%

a goat showed up behind one of the doors that he didn’t pick. Should he switch? The answer is yes. Why? That’s in Chapter 5½.

2%

The paradox of statistics is that they are everywhere—from batting averages to presidential polls—but

3%

Or maybe we just need to think more clearly about what many workers are doing during that ten-minute break. My professional experience suggests that many of those workers who report leaving their offices for short breaks are huddled outside the entrance of the building smoking cigarettes

3%

does your credit card company use data on what you are buying to predict if you are likely to miss a payment? (Seriously, they can do that.)

3%

It’s easy to lie with statistics, but it’s hard to tell the truth without them.

Most of the studies that you read about in the newspaper are based on regression analysis.

The problem is that the mechanics of regression analysis are not the hard part; the hard part is determining which variables ought to be considered in the analysis and how that can best be done.

There are so many potential regression pitfalls

regression analysis—from the simplest statistical relationships to the complex models cobbled together by Nobel Prize winners. At its core, regression analysis seeks to find the “best fit” for a linear relationship between two variables.

Regression analysis enables us to go one step further and “fit a line” that best describes a linear relationship between the two variables.

It should be intuitive that the larger the sum of residuals overall, the worse the fit of the line.

ordinary least squares gives us the best description of a linear relationship between two variables.

The regression line certainly does not describe every observation in the data set perfectly. But it is the best description we can muster for what is clearly a meaningful relationship between height and weight. It also means that every observation can be explained as WEIGHT = a + b(HEIGHT) + e, where e is a “residual” that catches the variation in weight for each individual that is not explained by height.

a one-unit increase in the independent variable (height) is associated with an increase of 4.5 units in the dependent variable (weight).

Thus, if we had no other information, our best guess for the weight of a person who is 5 feet 10 inches tall (70 inches) in the Changing Lives study would be –135 + 4.5 (70) = 180 pounds.

For any regression coefficient, you will generally be interested in three things: sign, size, and significance.

having perfect teeth may be associated with other personality traits that explain the earnings advantage; the earnings effect may be caused by the kind of people who care about their teeth, not the teeth themselves.

does it reflect a meaningful association that is likely to be observed for the population as a whole?

However, we know from the central limit theorem that the mean for a large, properly drawn sample will not typically deviate wildly from the mean for the population as a whole. Similarly, we can assume that the observed relationship between variables like height and weight will not typically bounce around wildly from sample to sample, assuming that these samples are large and properly drawn from the same population.

Once again, the normal distribution is our friend.

we can calculate a standard error for the regression coefficient that gives us a sense of how much dispersion we should expect in the coefficients from sample to sample.

the normal distribution is no longer willing to be our friend.

مساعد الشطي

Search the normal distrubution

(Basically the t-distribution is more dispersed than the normal distribution and therefore has “fatter tails.

any basic statistical software package will easily manage the additional complexity associated with using the t-distributions.

We can say that 95 times out of 100, we expect our confidence interval, which is 4.5 ± .26, to contain the true population parameter.

there is only a 5 percent chance that we are wrongly rejecting the null hypothesis.

In fact, our results are even more extreme than that. The standard error (.13) is extremely low relative to the size of the coefficient (4.5).

One rough rule of thumb is that the coefficient is likely to be statistically significant when the coefficient is at least twice the size of the standard error.* A statistics package also calculates a p-value, which is .000 in this case, meaning that there is essentia...

This highlight has been truncated due to consecutive passage length restrictions.

The R2 tells us how much of that variation around the mean is associated with differences in height alone. The answer in our case is .25, or 25 percent. The more significant point may be that 75 percent of the variation in weight for our sample remains unexplained. There are clearly factors other than height that might help us understand the weights of the Changing Lives participants. This is where things get more interesting.

When we include multiple variables in the regression equation, the analysis gives us an estimate of the linear association between each explanatory variable and the dependent variable while holding other dependent variables constant, or “controlling for” these other factors.

Regression analysis (often called multiple regression analysis when more than one explanatory variable is involved, or multivariate regression analysis)

I have included a table with the complete results of this regression equation in the appendix to this chapter.

an R2 of zero means that our regression equation does no better than the mean at predicting the weight of any individual in the sample; an R2 of 1 means that the regression equation perfectly predicts the weight of every person in the sample.)

If this were a real research project, there would be weeks or months of follow-on analysis to probe this finding.

The gender wage gap fades away as the authors add more explanatory variables to the analysis.

the value of multiple regression analysis, particularly the research insights that stem from being able to isolate the effect of one explanatory variable while controlling for other confounding factors.

Our goal now is to see how much of the remaining variation in weight in each room can be explained by education. In other words, what is the best linear relationship between education and weight in each room?