Statistics for People Who (Think They) Hate Statistics
Rate it:
33%
Flag icon
corresponding amount of area each z score encompasses and subtract one from the other. Often, drawing a picture helps, as in Figure 8.5.
33%
Flag icon
Just two things about standard scores. First, even though we are focusing on z scores, there are other types of standard scores as well. For example, a T score is a type of standard score that is computed by multiplying the z score by 10 and adding 50. One advantage of this type of score is that you rarely have a negative T score. As with z scores, T scores allow you to compare standard scores from different distributions. Second, a standard score is a whole different animal from a standardized score. A standardized score is one that comes from
33%
Flag icon
a distribution with a predefined mean and standard deviation. Standardized scores from tests such as the SAT and GRE (Graduate Record Exam) are used so that comparisons can easily be made between scores from different forms or administrations of the test, which all have the same mean and standard deviation.
33%
Flag icon
The name of the statistics game is being able to estimate the probability of an outcome. If we take what we have talked about and done so far in this chapter one step further, we can determine the probability of some event occurring. Then, we will use some criterion to judge whether we think that event is as likely, more likely, or less likely than what we would expect by chance. The research hypothesis presents a statement of the expected event, and we collect data and then use our statistical tools to evaluate how likely that event is. That’s the 20-second version of what inferential ...more
33%
Flag icon
Let’s say the criterion for fairness we will use is that if, in flipping the coin 10 times, we get heads (or heads turn up) less than 5% of the time, we’ll say the coin is rigged and call the police on Lew. This 5% criterion is one standard that is used by statisticians. If the probability of the event (be it the number of heads or the score on a test or the difference between the average scores for two groups) occurs in the extreme (and we’re saying the extreme is defined as less than 5% of all such occurrences), it’s an unlikely, or in this case an unfair, outcome.
34%
Flag icon
In other words, if, through the test of the research hypothesis, we find a difference and calculate that the likelihood of that difference occurring by chance is somewhat extreme, then the research hypothesis is a more attractive explanation than the null. So, if we find a z score (and remember that z scores have probabilities of occurrence associated with them as well) that is extreme (how extreme?—less than a 5% chance of occurring), we like to say that the reason for the extreme score is something to do with treatments or relationships or a real difference between groups
34%
Flag icon
and not just chance. We’ll go into much greater detail on this point in the following chapter.
34%
Flag icon
You could certainly surmise by now that distributions can be very different from one another in a variety of ways. In fact, there are four different ways in which they can differ: average value (you know—the mean, median, or mode), variability (range, variance, and standard deviation), skewness, and kurtosis.
34%
Flag icon
Another way to say this is that Distribution C has the largest amount of variability of the three distributions and Distribution A has the least.
34%
Flag icon
Skewness is a measure of the lack of symmetry, or the lopsidedness, of a distribution. In other words, one “tail” of the distribution is longer than another. For example, in Figure 8.9, Distribution A’s right tail is longer than its left tail, corresponding to a smaller number of occurrences at the high end of the distribution. This is a positively skewed distribution. Because the tail on the right, where higher values are, is longer, we call it positively skewed. This might be the case when you have a test that is very difficult, such that only a few people get scores that are relatively high ...more
34%
Flag icon
Kurtosis has to do with how flat or peaked a
34%
Flag icon
distribution appears, and the terms used to describe this characteristic are relative ones. For example, the term platykurtic refers to a distribution that is relatively flat compared with a normal, or bell-shaped, distribution. The term leptokurtic refers to a distribution that is relatively peaked, taller, compared with a normal, or bell-shaped, distribution. In Figure 8.10, Distribution A is platykurtic compared with Distribution B. Distribution C is leptokurtic compared with Distribution B. Figure 8.10 looks similar to Figure 8.8 for a good reason—distributions that are platykurtic, for ...more
34%
Flag icon
While skewness and kurtosis are used mostly as descriptive terms (such as “That distribution is negatively skewed”), there are mathematical indicators of how skewed or kurtotic a distribution is. For example, skewness is computed by subtracting the value of the median from the mean. If the mean of a distribution is 100 and the median is 95, the skewness value is 100 − 95 = 5, a positive number, and the distribution is positively skewed. If the mean of a distribution is 85 and the median is 90, the skewness value is 85 − 90 = −5, and the distribution is negatively skewed. There’s an even more ...more
34%
Flag icon
This is a pretty complicated formula that basically looks at how flat or peaked a set of scores is. You can see that if each score is the same, then the numerator is zero and K = 0, indicating no skewness. K equals zero when the distribution is normal or mesokurtic (now there’s a new word to throw around). If the individual scores (the Xs in the formula) differ greatly from the mean (and there is lots of variability), then the curve will probably be quite flat.
35%
Flag icon
Being able to figure out a z score, and being able to estimate how likely it is to occur in a sample of data, is the first and most important skill for understanding the whole notion of inference. Once we know how likely a test score (or other outcome values, such as a difference between groups) is, we can compare that
35%
Flag icon
likelihood with what we would expect by chance and then make informed decisions. As we start Part IV of Statistics for People Who (Think They) Hate Statistics, we’ll apply this model to specific examples of testing questions about the difference.
36%
Flag icon
Probably no term or concept causes the beginning statistics student more confusion than statistical significance.
36%
Flag icon
The level of chance or risk you are willing to take is expressed as a significance level, a term that unnecessarily strikes fear in the hearts of even strong men and women.
36%
Flag icon
Significance level (here’s the quick-and-dirty definition) is the risk associated with not being 100% confident that what you observe in an experiment is due to the treatment or what was being tested—in our example, whether or not mothers worked.
36%
Flag icon
Your job is to reduce this likelihood as much as possible by removing all of the competing reasons for any differences that you observed. Because you cannot fully eliminate the likelihood (because no one can control every potential factor), you assign some level of probability and report your results with that caveat. In sum (and in practice), the researcher defines a level of risk that he or she is willing to take.
36%
Flag icon
But can you be absolutely (which is pretty darn) sure? No, you cannot. Why? First, because you can never be sure that you are testing a sample that identically reflects the profile of the population. And even if the sample perfectly represents the population, there are always other influences that might affect the outcome and that you inadvertently missed when designing the experiment. There’s always the possibility of error (another word for chance). By concluding that the differences in test scores are due to differences in treatment, you accept some risk. This degree of risk is, in effect ...more
36%
Flag icon
Statistical significance (here’s the formal definition) is the degree of risk you are willing to take that you will reject a null hypothesis when it is actually true.
37%
Flag icon
In reality, however, maybe there is no difference in the whole population(s). If you reject the null you stated, you would be making an error. That type of error is
37%
Flag icon
also known as a Type I error. To use as much jargon as possible, statistical significance is the chance of making a Type I error. So the next step is to develop a set of steps to test whether our findings indicate that error is responsible for differences or whether it is more likely that actual differences are responsible.
37%
Flag icon
☹ Oops. Cell 2 represents a serious error. Here, we have rejected the null hypothesis (that there is no difference) when it is really true (and there is no difference between groups). Even though there is no difference between the two groups of children, we will conclude there is, and that’s an error—clearly a boo-boo, called a Type I error. ☹ Uh-oh, another type of error. Cell 3 represents a serious error as well. Here, we have accepted the null hypothesis (that there is no difference) when it is really false (and, indeed, there is a difference between groups). We have said that even though ...more
37%
Flag icon
If the level of significance is .05, it means that on any one test of the null hypothesis, there is a 5% chance you will reject it (and conclude that there is a group difference) when the null is true and there really is no group difference at all.
37%
Flag icon
In a research report, statistical significance is usually represented as p < .05, read as “the probability of observing that outcome is less than .05”
37%
Flag icon
A Type II error (Cell 3 in the chart) occurs when you incorrectly accept a false null hypothesis. For example, there may really be differences between the populations represented by the sample groups, but you mistakenly conclude there are not. When talking about the significance of a finding, you might hear the word power used. Power is a type of probability statement of how well a statistical test can detect and reject a null hypothesis when it is false. Mathematically, it’s calculated by subtracting the proportional chance of making a Type II error from 1.0. A more powerful test is always ...more
37%
Flag icon
In other words, as the sample characteristics more closely match those of the population (achieved by increasing the sample size), the likelihood that you will accept a false null hypothesis decreases.
37%
Flag icon
Here are some conclusions about the importance of statistical significance that we can reach, given this and the countless other possible examples: Statistical significance, in and of itself, is not very meaningful unless the study that is conducted has a sound conceptual base that lends some meaning to the significance of the outcome. Statistical significance cannot be interpreted independently of the context within which the outcomes occur. For example, if you are the superintendent in a school system, are you willing to retain children in Grade 1 if the retention program significantly ...more
37%
Flag icon
Researchers treat the reporting of statistical significance in many different ways in their written reports. Some use words such as significant (assuming that if something is significant, it is statistically so) or the entire phrase statistically significant. But some also use the phrase marginally significant, where the probability associated with a finding might be .04, or nearly significant, where the probability is something like .06. What to do? You’re the boss, if your own data are being analyzed or if you are reviewing someone else’s. Use your noodle and consider all the dimensions of ...more
38%
Flag icon
and a good topic for class discussion. It is only custom that .05 is the common level of significance required.
38%
Flag icon
Ever hear of “publication bias”? It’s where a preset significance value of .05 is used as the only criterion in the serious consideration of a paper for publication.
38%
Flag icon
Many journals ask researchers to report effect size (the size of the relation among variables, such as a correlation, or a standardized difference between groups), in addition to significance levels, so a fuller picture is available.
38%
Flag icon
Whereas descriptive statistics are used to describe a sample’s characteristics, inferential statistics are used to infer something about the population based on the sample’s characteristics. At several points throughout the first half of Statistics for People Who (Think They) Hate Statistics, we have emphasized that a hallmark of good scientific research is choosing a sample in such a way that it is representative of the population from which it was selected. The process then becomes an inferential one, in which you infer from the smaller sample to the larger population based on the results of ...more
38%
Flag icon
38%
Flag icon
Here are the general steps to take in the application of a statistical test to any null hypothesis. These steps will serve as a model for each of the chapters in Part IV. A statement of the null hypothesis. Do you remember that the null hypothesis is a statement of equality? The null hypothesis is what we assume is the “true” state of affairs given no other information on which to make a judgment. Setting the level of risk (or the level of significance or chance of making a Type I error) associated with the null hypothesis. With any research hypothesis comes a certain degree of risk that you ...more
38%
Flag icon
can learn what test is related to what type of question in this part of the book. Computation of the test statistic value. The test statistic value (also called the obtained value or observed value) is the result or product of a specific statistical calculation. For example, there are test statistics for the significance of the difference between the averages of two groups, for the significance of the difference of a correlation coefficient from zero, and for the significance of the difference between two proportions. You’ll actually compute the test statistic and come up with a numerical ...more
This highlight has been truncated due to consecutive passage length restrictions.
39%
Flag icon
found. Here is where the real beauty of the inferential method shines through. Only if your obtained value is more extreme than what would happen by chance (meaning that the result of the test statistic is not a result of some chance fluctuation) can you say that any differences you obtained are not due to chance and that the equality stated by the null hypothesis is not the most attractive explanation for any differences you might have found. Instead, the differences are more likely due to the treatment or whatever your independent variable is. If the obtained value does not exceed the ...more
This highlight has been truncated due to consecutive passage length restrictions.
39%
Flag icon
39%
Flag icon
The entire curve represents all the possible outcomes based on a specific null hypothesis, such as the difference between two groups or the significance of a correlation coefficient. The critical value is the point beyond which the obtained outcomes are judged to be so rare that the conclusion is that the obtained outcome is not due to chance but to some other factor. In this example, we define rare as having a less than 5% chance of occurring. If the outcome representing the obtained value falls to the left of the critical value (it is less extreme), the conclusion is that the null hypothesis ...more
39%
Flag icon
If the obtained value falls to the right of the critical value (it is more extreme), the conclusion is that the research hypothesis is the most attractive explanation for any differences that are observed. In other words, the obtained value falls in the region (5% of the area under the curve) where we would expect only outcomes due to something other than chance to occur.
39%
Flag icon
Do keep in mind that most statisticians would probably actually use the standard deviation of the mean (called the standard error of the mean) to compute confidence intervals for this question, but we have kept the example simple to introduce the concept.
39%
Flag icon
The use of the SEM is a bit more complex, but it is an alternative way of computing and understanding confidence intervals and makes sense because we are trying to guess the range of possible values for a mean, not an individual score.
39%
Flag icon
Why does the confidence interval itself get larger as the probability of your being correct increases (from, say, 95% to 99%)? Because the larger range of the confidence interval (in this case, 19.6 [73.8, 54.2] for a 95% confidence interval vs. 25.6 [76.8, 51.2] for a 99% confidence interval) allows you to encompass a larger number of possible outcomes and you can thereby be
39%
Flag icon
more conf...
This highlight has been truncated due to consecutive passage length restrictions.
40%
Flag icon
Here’s how you can use Figure 10.1, the flowchart introduced in Chapter 9, to select the appropriate test statistic, the one-sample z test.
40%
Flag icon
We are examining differences between a sample and a population. There is only one group being tested. The appropriate test statistic is a one-sample z test.
40%
Flag icon
The denominator (the value on the bottom that you divide by), an error term, is called the standard error of the mean and is the value we would expect by chance, given all the variability that surrounds the selection of all possible sample means from a population.
41%
Flag icon
p < .05 (the really important part of this little phrase) indicates that (if the null hypothesis is true) the probability is less than 5% that on any one test of that hypothesis, the sample and the population averages will differ by that much or more.