Statistics for People Who (Think They) Hate Statistics
Rate it:
49%
Flag icon
First, we compute the sum of squares for each source of variability.
49%
Flag icon
Second, we need to compute the mean sum of squares, which is simply an average sum of squares.
49%
Flag icon
We do that by dividing each sum of squares by the appropriate number of degrees of freedom (df). Remember, degrees of freedom is an approximation of the sample or group size.
49%
Flag icon
Here’s a source table of the variance estimates used to compute the F ratio. This is how most F tables appear in professional journals and manuscripts.
49%
Flag icon
Because you already know about t tests, you might be wondering how a t value (which is always used for the test between the difference of the means for two groups) and an F value (which is always used for more than two groups) are related. As we mentioned earlier, an F value for two groups is equal to a t value for two groups squared, or F = t2. Handy trivia question, right? But also useful if you know one and need to know the other. And it shows that all of these inferential statistical procedures use the same strategy—compare an observed difference to a difference expected by chance alone.
50%
Flag icon
So How Do I Interpret F(2, 27) = 8.80, p < .05?
50%
Flag icon
But because ANOVA is an omnibus test, you don’t know where the source of the significant difference lies. What if you’re interested in that, though? And you will almost certainly be. Well, you could take two groups (or pairs) at a time (such as 25% color and 75% color) and test them against each other. In fact, you could test every combination of two against each other. Kosher? Maybe not. Performing multiple t tests like this without a well-thought-out hypothesis for each comparison is sometimes called fishing, and fishing is actually against the law in some jurisdictions. When you do this, ...more
50%
Flag icon
In previous chapters, you saw how we used Cohen’s d as a measure of effect size. Here, we change directions and use a value called eta squared or η2. As with Cohen’s d, η2 has a scale of how large the effect size is: A small effect size is about .01. A medium effect size is about .06. A large effect size is about .14.
50%
Flag icon
Instead of interpreting this effect size as the size of the difference between groups as we did for Cohen’s d, we interpret eta squared as the proportion of the variance in the dependent variable explained by the independent variable.
50%
Flag icon
Okay, so you’ve run an ANOVA and you know that an overall difference exists among the means of three or four or more groups. But where does that difference lie?
50%
Flag icon
You already know not to perform multiple t tests. You need to perform what are called post hoc, or after-the-fact, comparisons.
52%
Flag icon
Why a two-way analysis of variance? Easy—there were two independent factors, with the first being level of experience and the second being age. Here, just as with any analysis of variance procedure, there are
52%
Flag icon
a test of the main effect for age, a test of the main effect for experience, and a test for the interaction between experience and age (which turned out to be significant).
52%
Flag icon
The very cool thing about analysis of variance when more than one factor or independent variable is tested is that the researcher can look not only at the individual effects of each factor but also at the simultaneous effects of both, through what is called an interaction. An interaction means that the strength of the effect of one independent variable on t...
This highlight has been truncated due to consecutive passage length restrictions.
52%
Flag icon
We are testing for differences among scores of different groups, in this case, groups that differ on level of experience and age. The participants are not being tested more than once. We are dealing with more than two groups. We are dealing with more than one factor or independent variable. The appropriate test statistic is factorial analysis of variance.
52%
Flag icon
You already know that ANOVA comes in at least one flavor, the simple analysis of variance we discussed in Chapter 13. With simple analysis of variance, there is one factor or one treatment variable (such as group membership) being explored, and there are more than two groups or levels within this factor or treatment variable. Now, we bump up the entire technique a notch to so we can explore more than one factor simultaneously. This is called factorial analysis of variance. The term factorial analysis of variance is used to describe any analysis of variance where there is more than one ...more
52%
Flag icon
By the way, even though there are only two groups (or levels) for each independent variable, we will still do a “two-way” factorial analysis of variance instead of two different analyses of variance. That’s because we want to see if there is an interaction.
52%
Flag icon
There are three questions that you can ask and answer with this type of analysis: Is there a difference in weight loss between two levels of exercise program, high impact and low impact? Is there a difference in weight loss between two levels of gender, male and female? Is the effect of being in the high- or low-impact program different for males or females? Questions 1 and 2 deal with the presence of main effects (the effect of each independent variable by itself), whereas Question 3 deals with the interaction between the two factors.
52%
Flag icon
You might well remember that the primary task of analysis of variance is to test for the difference between two or more groups. When an analysis of the data reveals a difference between the levels of any factor, we talk about there being a main effect.
53%
Flag icon
Let’s look at a different source table that indicates men and women
53%
Flag icon
are affected differentially across treatments, indicating the presence of an interaction effect.
53%
Flag icon
Here, there is no main effect for treatment or gender (p = .127 and .176, respectively), but yikes, there is one for the treatment-by-gender interaction (p = .004), which makes this a very interesting outcome. In effect, it does not matter whether you are in the high- or low-impact treatment group or whether you are male or female, but it does matter a whole lot if you are female and in the low-intensity treatment group. You will lose much less weight.
53%
Flag icon
The lines may not always cross as dramatically, and the appearance of the graph also depends on what variable the x-axis explores, but charting the means is often the only way to get a sense of what’s going on as far as main effects and interactions. The actual statistical test that is conducted to see if an interaction is significant is basically to look at the lines on the graph and judge if they are parallel. If there is no interaction, the effect of one independent variable on the dependent variable is the same for each level of the other independent variable. So each line raises or lowers ...more
53%
Flag icon
think that all you have to do is a simple t test between the averages for males and females and then another simple t test for the averages between those who participated in the high-impact and those who participated in the low-impact treatment—and you would have found nothing. But, using the idea of an interaction between main factors, you find out that there is a differential effect—an outcome that would have gone unnoticed otherwise. Indeed, if you can bear the cost of admission, interactions really are the most interesting outcomes in any factorial analysis of variance. Consider including ...more
53%
Flag icon
Classic experimental design, which has been around since the 1940s, almost always involves comparing group means, t...
This highlight has been truncated due to consecutive passage length restrictions.
53%
Flag icon
Wondering why the SPSS output is labeled univariate analysis of variance? We knew you were. Well, in SPSS talk, this is an analysis that looks at only one dependent or outcome variable—in this case, weight loss. If we had more than one outcome variable as part of the research question (such as attitude toward eating), then it would be a multivariate analysis of variance, which not only looks at group differences and more than one dependent variable but also controls for the relationship between the dependent variables.
53%
Flag icon
A different formula is used to compute the effect size for a factorial ANOVA, but the idea is the same. We are still
53%
Flag icon
making a judgment about the magnitude of a difference that we observe.
54%
Flag icon
According to some customary guidelines (.01 is small, .06 is medium, and .14 is large), this effect size of .34, reflecting the strength of the association, is huge.
55%
Flag icon
He found that higher levels of marital quality were related to higher levels of quality of parent–child relationships; this was found for concurrent measures (at the present time) as well as longitudinal measures (over time).
55%
Flag icon
The relationship between variables, not the difference between groups, is being examined. Only two variables are being used.
55%
Flag icon
The appropriate test statistic to use is the t test for the correlation coefficient.
55%
Flag icon
Here’s something you’ll probably be pleased to read: The correlation coefficient can act as its own test statistic.
55%
Flag icon
The Greek letter ρ, or rho, represents the population estimate of the correlation coefficient.
55%
Flag icon
So, for example, if you think that there is a positive correlation between two variables, then the test is one-tailed. Similarly, if you hypothesize that there is a negative correlation between two variables, the test is one-tailed as well. It’s only when you don’t predict the direction of the relationship that the test is two-tailed.
56%
Flag icon
The correlation coefficient is, itself, an effect size! In fact, it’s the simplest of effect sizes as it provides a direct measure of the relationship between two variables, right? It’s interpreted this way: .2 = small, .5 = medium, and .8 = large. Remember, too,
56%
Flag icon
that another way to use a correlation as an effect size is to square it and interpret that coefficient of determination as the proportion of the variance in one variable that overlaps with another.
56%
Flag icon
In Chapter 6, you may remember that we talked about such reliability coefficients as test–retest (the correlation of scores from the same test at two points in time), parallel forms (the correlation between scores on different forms of the same test), and internal
56%
Flag icon
consistency (the correlation between different parts of a test). Reliability is basically the correlation of a test with itself.
56%
Flag icon
But we should mention and discuss this topic again. Even if a correlation coefficient is significant (as was the case in the example in this chapter), it does not mean that the amount of variance accounted for is meaningful. For
56%
Flag icon
example, in this case, the coefficient of determination for a simple Pearson correlation value of .437 is equal to .190, indicating that 19% of the variance is accounted for and a whopping 81% of the variance is not. This leaves lots of room for doubt, doesn’t it?
56%
Flag icon
Correlations are powerful tools that point out the direction and size of a relationship and help us to understand what two different outcomes share with one another. Remember that correlations tell us about associations but not about whether one variable affects another.
57%
Flag icon
Not only can you compute the degree to which two variables are related to one another (by computing a correlation coefficient as we did in Chapter 5), but you can also use these correlations to predict the value of one variable based on the value of another.
57%
Flag icon
The basic idea is to use a set of previously collected data (such as data on variables X and Y), calculate how correlated these variables are with one another, and then use that correlation and the knowledge of X to predict Y.
57%
Flag icon
How does regression work? Data are collected on past events (such as the existing relationship between two variables) and then applied to a future event given knowledge of only one variable.
57%
Flag icon
The higher the absolute value of the correlation coefficient, regardless of whether it is direct or indirect (positive or negative), the more accurate the prediction is of one variable from the other based on that correlation. That’s because the more two variables share in common, the more you know about the second variable based on your knowledge of the first variable. And you may already surmise that when the correlation is perfect (+1.0 or −1.0), then the prediction is perfect as well. If rxy = −1.0 or +1.0 and if you know the value of X, then you also know the exact value of Y. Likewise, ...more
57%
Flag icon
Why the prediction of Y from X and not the other way around? Convention. Seems like a good idea to have a consistent way to identify variables, so the Y variable becomes the dependent variable or the one being predicted and the X variable becomes the independent variable and is the variable used to predict the value of Y. And when predicted, the Y value is represented as Y′ (read as Y prime)—the predicted value of Y. (To sound like an expert, you might call the independent variable a predictor and the dependent variable the criterion. Purists save the terms independent and dependent to ...more
58%
Flag icon
Prediction is the computation of future outcomes based on a knowledge of present ones.
58%
Flag icon
To predict college GPA from high school GPA, we have to create a regression equation and use that to plot what is called a regression line. A regression line reflects our best guess as to what score on the Y variable (college GPA) would be predicted by a score on the X variable (high school GPA). For all the data you see in Table 16.1, the regression line is drawn so that it minimizes the distance between itself and each of the points on the predicted (Y′) variable.
58%
Flag icon
First, it’s the regression of the Y variable on the X variable. In other words, Y (college GPA) is being predicted from X (high school GPA). This regression line is also called the line of best fit. The line fits these data because it minimizes the distance between each individual point and the regression line. Those distances are errors because it means the prediction was wrong; it was some distance