More on this book
Community
Kindle Notes & Highlights
Read between
October 14, 2018 - February 16, 2019
If the .05 significance level seems somewhat arbitrary, that’s because it is. There is no single standardized statistical threshold for rejecting a null hypothesis. Both .01 and .1 are also reasonably common thresholds for doing the kind of analysis described above.
When you read in the newspaper that people who eat twenty bran muffins a day have lower rates of colon cancer than people who don’t eat prodigious amounts of bran, the underlying academic research probably looked something like this: (1) In some large data set, researchers determined that individuals who ate at least twenty bran muffins a day had a lower incidence of colon cancer than individuals who did not report eating much bran. (2) The researchers’ null hypothesis was that eating bran muffins has no impact on colon cancer. (3) The disparity in colon cancer outcomes between those who ate
...more
(This is the “healthy user bias” from Chapter 7.)
This distinction between correlation and causation is crucial to the proper interpretation of statistical results. We will revisit the idea that “correlation does not equal causation” later in the book. I should also
One of the first questions you want to ask is, How big is this effect? It could easily be .9 points; on a test with a mean score of 500, that is not a life-changing figure. In Chapter 11, we will return to this crucial distinction between size and significance when it comes to interpreting statistical results.
First, you should recognize that each group of children, the 59 with autism and the 38 without autism, constitutes a reasonably large sample drawn from their respective populations—all children with and without autism spectrum disorder. The samples are large enough that the central limit will apply.
We can say with 95 percent confidence that the range 1284.4 to 1336.4 cubic centimeters (the sample mean of 1310.4 ± two standard errors) contains the average total brain volume for children in the general population with autism spectrum disorder.
The powerful process of statistical inference is based on probability, not on some kind of cosmic certainty. We
“Claims that defy almost every law of science are by definition extraordinary and thus require extraordinary evidence. Neglecting to take this into account—as conventional social science analyses do—makes many findings look far more significant than they really are.”
In statistical parlance, this is known as a Type I error. Consider the example of an American
threshold to something like “a strong hunch that the guy did it.” This is going to ensure that more criminals go to jail—and also more innocent people. In a statistical context, this is the equivalent of having a relatively low significance level, such as .1.
This is known as a Type II error, or false negative.
nor a Type II error is acceptable in this situation, which is why society continues to debate about the appropriate balance between fighting terrorism and protecting civil liberties.
where = mean for sample x = mean for sample y sx = standard deviation for sample x sy = standard deviation for sample y nx = number of observations in sample x ny = number of observations in sample y
One- and Two-Tailed Hypothesis Testing
We will therefore reject our null hypothesis if our sample of male basketball players has a mean height that is significantly higher or lower than the mean height for our sample of normal men. This requires a two-tailed hypothesis test. The cutoff points for rejecting our null hypothesis will be different because we must now account for the possibility of a large difference in sample means in both directions: positive or negative.
that a poll has a “margin of error” of ± 3 percent, this is really just the same kind of 95 percent confidence interval that we calculated in the last chapter. Our “95 percent confidence” means that if we conducted 100 different polls on samples drawn from the same population, we would expect the answers we get from our sample in 95 of those polls to be within 3 percentage points in one direction or the other of the population’s true sentiment.
One fundamental difference between a poll and other forms of sampling is that the sample statistic we care about will be not a mean (e.g., 187 pounds) but rather a percentage or proportion (e.g., 47 percent of voters, or .47).
The standard error is what tells us how much dispersion we can expect in our results from sample to sample,
You explain that the answer depends on how confident the network people would like to be in the announcement—or, more specifically, what risk they are willing to take that they will get it wrong. Remember, the standard error gives us a sense of how often we can expect our sample proportion (the exit poll) to lie reasonably close to the true population proportion (the election outcome).
candidate has earned 53 percent of the vote ± 4 percent, or between 49 and 57 percent of the votes cast. Meanwhile, the Democratic candidate has earned 45 percent ± 4 percent, or between 41 and 49 percent of the votes cast. And, yes, now you have a new problem. At the 95 percent confidence level, you cannot reject the possibility that the two candidates may be tied with 49 percent of the vote each. This is an inevitable trade-off; the only way to become more certain that your polling results will be consistent with the election outcome without new data is to become more timid in your
...more
By being less specific. You are “absolutely positive” that Thomas Jefferson was one of the first five presidents.
As a result, the standard error will shrink significantly. The new standard error for the Republican candidate is which is .01.
For that reason, I have adopted a common convention, which is to take the higher standard error of the two and use that for all of the candidates.
Is this an accurate sample of the population whose opinions we are trying to measure? Many common data-related challenges
The fact that American attitudes toward capital punishment change dramatically when life without parole is offered as an option tells us something important. The key point, says Newport, is to view any polling result in context. No single question or poll can capture the full depth of public opinion on a complex issue.
their true incidence in the population by 20 percent [(60 – 50)/50]. And in so doing, you have also undercounted the Democrats by 20 percent [(40 – 50)/50]. That could happen, even with a decent polling methodology. Your second
meaning they have minimal say over what tasks are performed or how those tasks are carried out—have a significantly higher mortality rate than other workers in the civil service with more decision-making authority. According to this research, it is not the stress associated with major responsibilities that will kill you; it is the stress associated with being told what to do while having little say in how or when it gets done.
regression analysis.
Different families make different child care decisions because they are different.
Now, there are two key phrases in that last sentence. The first is “when done properly.” Given adequate data and access to a personal computer, a six-year-old could use a basic
The second important phrase above is “help us estimate.” Our child care study does not give us a “right” answer for the relationship between day care and subsequent school performance. Instead, it quantifies the relationship observed for a particular group of children over a particular stretch of time.
disease. We are instead rejecting the null hypothesis that exercise has no association with heart disease, on the basis of some statistical threshold that was chosen before the study was conducted.
would be less than 5 in 100, or below some other threshold for statistical significance.
Or perhaps causality goes the other direction. Could having a healthy heart “cause” exercise? Yes. Individuals who are infirm, particularly those who have some incipient form of heart disease, will find it much harder to exercise.
This is not a terribly insightful or specific statement. Regression analysis enables us to go one step further and “fit a line” that best describes a linear relationship between the two variables. Many possible lines
uses a methodology called ordinary least squares, or OLS. The technical details, including why OLS produces the best
OLS fits the line that minimizes the sum of the squared residuals.
which is that ordinary least squares gives us the best description of a linear relationship between two variables.
This is known as the regression equation, and it takes the following form: y = a + bx, where y is weight in pounds; a is the y-intercept of the line (the value for y when x = 0); b is the slope of the line; and x is height in inches. The slope of the line we’ve fitted, b, describes the “best” linear relationship between height and weight for this sample, as defined by ordinary least squares.
weight in this case—is known as the dependent variable (because it depends on other factors). The variables that we are using to explain our dependent variable are known as explanatory variables since they explain the outcome that we care
This figure is also known as the constant, because it is the starting point for calculating the weight of all observations in the study.
questions. For any regression coefficient, you will generally be interested in three things: sign, size, and significance.
Sign.
data on something like “miles run per month,” I am fairly certain that the coefficient on “miles run” would be negative. Running more is associated with weighing less.
Size. How big is the observed effect between the independent variable and the dependent variable? Is it of a magnitude that matters? In this case, every one inch in height is associated with 4.5
socially insignificant.
example, suppose that we are examining determinants of income. Why do some people make more money than others? The explanatory variables are likely to be things like education, years of work experience, and so on. In a large data set, researchers might also find that people with whiter teeth earn $86 more per year than other workers, ceteris paribus. (“Ceteris paribus” comes from the Latin meaning “other things being equal.”) The positive and statistically significant coefficient on the “white teeth” variable assumes that the individuals being compared are similar in other respects:
This means (1) we’ve rejected the null hypothesis that really white teeth have no association with income with a high degree of confidence; and (2) if we analyze other data samples, we are likely to find a similar relationship between good-looking teeth and higher income.
Significance.