Albert Bancroft’s Kindle Notes & Highlights for Naked Statistics: Stripping the Dread from the Data

Naked Statistics: Stripping the Dread from the Data

Rate it:

More on this book

Community

Jason

Jason

1 note & 12 highlights

مساعد الشطي

مساعد الشطي

1 note & 41 highlights

Lezio Finck

Lezio Finck

Samir Cardoso

Samir Cardoso

Brett Wedlund

Brett Wedlund

Tristan Privott

Tristan Privott

Colin Roberts

Colin Roberts

Roozbeh Daneshvar

Roozbeh Daneshvar

Bhavdeep Sethi

Bhavdeep Sethi

Ozgur

Ozgur

Bennett

Bennett

Paul Brown

Paul Brown

Jaime

Jaime

Amanda

Amanda

Arpit Agrawal

Arpit Agrawal

Muneel Zaidi

Muneel Zaidi

Kevin Wainwright

Kevin Wainwright

David E

David E

Matt

Matt

Damian Skotzke

Damian Skotzke

Rachel Swisher Ray

Rachel Swisher Ray

Cary Adkinson

Cary Adkinson

Andi

Andi

Logan Yu

Logan Yu

Daniel

Daniel

Rick Marriner

Rick Marriner

Nico Párraga

Nico Párraga

Muhammad Hasby

Muhammad Hasby

Nikos

Nikos

Kindle Notes & Highlights

Albert Bancroft

by Albert Bancroft

See all Albert’s Notes & Highlights

Naked Statistics: Stripping the Dread from the Data

by Charles Wheelan

Read between October 14, 2018 - February 16, 2019

Yet that finding, and the implication that we can spend our way to better schools, is deeply flawed.

The number of cars in a family’s garage is a proxy for their income, education, and other measures of socioeconomic status.

Highly correlated explanatory variables (multicollinearity).

Assume we are trying to gauge the effect of illegal drug use on SAT scores. Specifically, we have data on whether the participants in our study have ever used cocaine and also on whether they have ever used heroin.

If we put both variables in the equation, we will have very few individuals who have used one drug but not the other, which leaves us very little variation in the data with which to calculate their independent effects.

Extrapolating beyond the data. Regression

However, our results are valid only for a population that is similar to the sample on which the analysis has been done.

The regression equation based on the Changing Lives data predicts that her weight at birth should have been negative 19.6 pounds. (She weighed 8½ pounds.)

“Low control in the work environment is associated with an increased risk of future coronary heart disease among men and women employed in government offices”

Data mining (too many variables).

If you put enough junk variables in a regression equation, one of them is bound to meet the threshold for statistical significance just by chance.

Clever researchers can always build a theory after the fact for why some curious variable that is really just nonsense turns up as statistically significant.

The probability of flipping five heads in a row is 1/32, or .03.

This is comfortably below the .05 threshold we typically use to reject a null hypothesis. Our null hypothesis in this case is that the student has no special talent for flipping heads; the lucky string of heads (which is bound to happen for at least one student when I start with a large group) allows us to reject the null hypothesis and adopt the alternative hypothesis:

This student has a special ability to flip heads. After he has achieved this impressive feat, we can study him for clues about his flipping success—his flipping form, his athletic training, his extraordinary concentration whi...

This highlight has been truncated due to consecutive passage length restrictions.

“Epidemiology is so beautiful and provides such an important perspective on human life and death, but an incredible amount of rubbish is published.”6

research.) One reason for this “dirty little secret” is the positive publication bias described in Chapter 7.

Dr. Ioannidis estimates that roughly half of the scientific papers published will eventually turn out to be wrong.9 His research was published in the Journal of the American Medical Association, one of the journals in which the articles he studied had appeared. This does create a certain mind-bending irony: If Dr. Ioannidis’s research is correct, then there is a good chance that his research is wrong.

This process is referred to as estimating the equation,

The best researchers are the ones who can think logically about what variables ought to be included in a regression equation, what might be missing, and how the eventual results can and should be interpreted.

fingerprint at the scene of the crime. It points us in the right direction, but it’s rarely enough to convict. (And sometimes a fingerprint at the scene of a crime doesn’t belong to the perpetrator.)

clever researchers find ways to compare some treatment (e.g., going to Harvard) with the counterfactual, which is what would have happened in the absence of that treatment.

controls), we will have a serious reverse causality problem.

We have a solid theoretical reason to believe that putting more police officers on the street will reduce crime, but it’s also possible that crime could “cause” police officers, in the sense that cities experiencing crime waves will hire more police officers.

Of course, the places with lots of doctors also tend to have the highest concentration of sick people. These doctors aren’t making people sick; they are located in places where they are needed most (and at the same time sick people are moving to places where they can get appropriate medical care).

A treatment can be a literal treatment, as in some kind of medical intervention, or it can be something like attending college or receiving job training upon release from prison. The point is that we are seeking to isolate the effect of that single factor; ideally we would like to know how the group receiving that treatment fares compared with some other group whose members are identical in all other respects but for the treatment.

D.C. police presence is unrelated to the conventional crime rate, or “exogenous.”

This is one of those rare instances in life in which the best approach involves the least work!

Medical trials typically aspire to do randomized, controlled experiments. Ideally these clinical trials are double-blind, meaning that neither the patient nor the physician knows who is receiving the treatment and who is getting a placebo.

The Project STAR experiment cost roughly $12 million. The study on the effect of prayer on postsurgical complications cost $2.4 million. The finest studies are like the finest of anything else: They cost big bucks.

trial. A more economical alternative is to exploit a natural experiment, which happens when random circumstances somehow create something approximating a randomized, controlled experiment.

She still had a methodological challenge; if the residents of a state live longer after the state raises its minimum schooling law, we cannot attribute the longevity to the extra schooling. Life expectancy is generally going up over time. People lived longer in 1900 than in 1850, no matter what the states did. However, Lleras-Muney

laboratory experiment in which the residents of Illinois are forced to stay in school for seven years while their neighbors in Indiana can leave school after six years.

What happened? Life expectancy of those adults who reached age thirty-five was extended by an extra one and a half years just by their attending one additional year of school.

A nonequivalent control group can still be a very helpful tool.

Economists Stacy Dale and Alan Krueger found a way to answer this question by exploiting* the fact that many students apply to multiple colleges.

Difference in differences. One of the best ways to observe cause and effect is to do something and then see what happens.

Maybe. The enormous potential pitfall with this approach is that life tends to be more complex than throwing chicken nuggets across the kitchen.

“difference in differences” approach can help us identify the effects of some intervention by doing two things. First, we examine the “before” and “after” data for whatever group or jurisdiction has received the treatment, such as the unemployment figures for a county that has implemented a job training program. Second, we compare those data with the unemployment figures over the same time period for a similar county that did not implement any such program.

The important assumption is that the two groups used for the analysis are largely comparable except for the treatment; as

How does the unemployment rate in the county with the new job training program change over time relative to the county that did not implement such a program?

“difference in differences.” The other county in this study is effectively acting as a control group, which allows us to take advantage of the data collected before and after the intervention.

The students who attend summer school are there because they are struggling. Even if the summer school program is highly effective, the participating students will probably still do worse in the long run than the students who were not required to take summer school.

groups are created by comparing those students who just barely fell below the threshold for summer school with those who just barely escaped it. Think about it: the students who fail a midterm are appreciably different from students who do not fail the midterm. But students who get a 59 percent (a failing grade) are not appreciably different from those students who get a 60 percent (a passing grade).

The juvenile offenders who are sent to prison typically commit more serious crimes than the juvenile offenders who receive lighter sentences; that’s why they go to prison.

Randi Hjalmarsson, a researcher now at the University of London, exploited rigid sentencing guidelines for juvenile offenders in the state of Washington to gain insight into the causal effect of a prison sentence on future criminal behavior. Specifically, she compared the recidivism rate for those juvenile offenders who were “just barely” sentenced to prison with the recidivism rate for those juveniles who “just barely” got a pass (which usually involved a fine or probation).

For research purposes, those two individuals are essentially the same—until one of them goes to jail. And at that point, their behavior does appear to diverge sharply. The juvenile offenders who go to jail are significantly less likely to be convicted of another crime (after they are released from jail).

We care about what works. This is true in medicine, in economics, in business, in criminal justice—in everything. Yet causality is a tough nut to crack, even in cases where cause and effect seems stunningly obvious.

“counterfactual,” which is what would have happened in the absence of that tr...

This highlight has been truncated due to consecutive passage length restrictions.

There is only one intellectually honest answer: We will never know. The reason we will never know is that we do not know—and cannot know—what would have happened if the United States had not invaded Iraq.

« Prev 1 … 6 7 8 Next »