More on this book
Kindle Notes & Highlights
by
Andy Field
Read between
January 9 - February 23, 2021
These distances between each data point and the group mean are squared and added together to give the residual sum of squares, SSR: (12.13) Equation (12.13) says that the sum of squares for each group is the squared difference between each participant’s score in a group (x ig) and the group mean , and the two sigma signs mean that we repeat this calculation for the first participant (i = 1) through to the last (n), in the first group (g = 1) through to the last (k).
The degrees of freedom for SSR (dfR) are the total degrees of freedom minus the degrees of freedom for the model (dfR = dfT − dfM = 14 − 2 = 12). Put another way, this is N − k: the total sample size, N, minus the number of groups, k.
The F-statistic (a.k.a. the F-ratio) is a measure of the ratio of the variation explained by the model and the variation attributable to unsystematic factors.
it is the ratio of how good the model is to how bad it is (how much error there is).
It is calculated by dividing the model mean squares by the residual ...
This highlight has been truncated due to consecutive passage length restrictions.
Typically researchers are interested in whether this ratio is significant:
what would the probability be of getting an F at least this big if the experimental manipulation, in reality, had no effect at all on happiness (i.e. the null hypothesis).
Normality is tested on scores within groups, not across the entire sample
we assume that the variance of the outcome is steady as the predictor changes (in this context it means that variances in the groups are equal).
the F-statistic can be adjusted to correct for the degree of heterogeneity and so you may as well just use the corrected F because small deviations from homogeneity will result in very small corrections
Two such corrections are the Brown–Forsythe F (Brown & Forsythe, 1974), and Welch’s F (Welch, 1951).
First, does the F control the Type I error rate or is it significant even when there are no differences between means?
Second, does F have enough power (i.e., is it able to detect differences when they are there)?
Recent simulations show that differences in skewness, non-normality and heteroscedasticity interact in complicated ways that impact power
in the absence of normality, violations of homoscedasticity will affect F even when group sizes are equal (Wilcox, 2010, 2012, 2016), and when means are equal the error rate (which should be 5%) can be as high as 18%.
Heavy-tailed distributions are particularly problematic: if you set up a situation with power of 0.9 to detect an effect in a normal distribution and contaminate that distribution with 10% of scores from a normal distribution with a bigger variance (so you get heavier tails), power drops to 0.28 (despite the fact that only 10% of scores have changed).
heavy-tailed samples have implications for the central limit theorem, which says that in samples of 30 or more the sampling distribution should be normal (Section 6.6.1); for heavy-tailed distributions samples need to be much larger, up to 160 in some cases (Wilcox, 2010). To sum up, F is not robust, despite what your supervisor might tell you.
Violations of the assumption of independence are very serious indeed.
you think you’ll make a Type I error 5% of the time but in fact you’ll make one 74% of the time!
you routinely interpret Welch’s F then you need never even think about homogeneity of variance, and you can bootstrap parameter estimates, which won’t affect F itself, but at least you know that the model parameters are robust.
the need to follow up a significant F by looking at the model parameters, which tell us about specific differences between means.
The trouble is that with two dummy variables we end up with two t-tests, which inflates the familywise error rate
The other problem is that the dummy variables might not make all the comparisons that we want to make
Contrast coding is a way of assigning weights to groups in dummy variables to carry out planned contrasts (also known as planned comparisons).
compare every group mean to all others (i.e., to conduct several overlapping tests using a t-statistic each time) but using a stricter acceptance criterion that keeps the familywise error rate at 0.05. These are known as using post hoc tests
The F-statistic is based upon splitting the total variation into two component parts: the variation due to the model or experimental manipulation (SSM) and the variation due to unsystematic factors (SSR)
students struggle with the notion of designing planned contrasts, but there are three rules that can help you to work out what to do. If you have a control group, this is usually because you want to compare it against any other groups. Each contrast must compare only two ‘chunks’ of variation. Once a group has been singled out in a contrast it can’t be used in another contrast.
a group is singled out in one contrast, then it should not reappear in another contrast. The important thing is that we are breaking down one chunk of variation into smaller independent chunks. This independence matters for controlling the Type I error rate.
each contrast must compare only two chunks of variance. This rule is so that we can interpret the contrast. The original F tells us that some of our means differ, but not which ones, and if we were to perform a contrast on more than two chunks of variance we would be no better off. By comparing only two chunks of variance we know that the result represents a significant difference (or not) between these two portions of variation.
When we carry out a planned contrast, we compare ‘chunks’ of variance and these chunks often consist of several groups.
When you design a contrast that compares several groups to one other group, you are comparing the means of the groups in one chunk with the mean of the group in the other chunk.
To carry out contrasts we need to code our dummy variables in a way that results in bs that compare the ‘chunks’ that we set out in our contrasts.
The values assigned to the dummy variables are known as weights.
Rule 1: Choose sensible contrasts. Remember that you want to compare only two chunks of variation and that if a group is singled out in one contrast, that group should be excluded from any subsequent contrasts. Rule 2: Groups coded with positive weights will be compared against groups coded with negative weights. So, assign one chunk of variation positive weights and the opposite chunk negative weights. Rule 3: If you add up the weights for a given contrast the result should be zero. Rule 4: If a group is not involved in a contrast, automatically assign it a weight of zero, which will
...more
It is important that the weights for a contrast sum to zero because it ensures that you are comparing two unique chunks of variation.
If the products sum to zero then the contrasts are independent or orthogonal.
The main ANOVA for the model is the same as when dummy coding was used (compare it to Output 12.1), showing that the model fit is the same (it should be because the model represents the group means and these have not changed); however, the b-values have changed because the values of our dummy variables have changed.
notice is that the intercept is the grand mean, 3.467 (see, I wasn’t telling lies). Second, the b for contrast 1 is one-third of the difference between the average of the experimental conditions and the control condition.
the b for contrast 2 is half of the difference between the ...
This highlight has been truncated due to consecutive passage length restrictions.
Contrasts don’t have to be orthogonal: non-orthogonal contrasts are contrasts that are related.
Standard dummy coding (Section 12.2) is an example of non-orthogonal contrasts because the baseline group is used in each contrast.
There is nothing intrinsically wrong with non-orthogonal contrasts, but you must be careful about how you interpret them because the contrasts are related and so the resulting test statistics and p-values will be correlated to some degree.
When you code categorical variables in the data editor, SPSS Statistics treats the lowest-value code as group 1, the next highest code as group 2, and so on.
depending on which contrasts you want, you should code your grouping variable appropriately (and then use Table 12.6 as a guide to which contrasts you’ll get).
polynomial contrast. This contrast tests for trends in the data, and in its most basic form it looks for a linear trend (i.e., that the group means increase proportionately).
The linear trend should be familiar to you and represents a simple proportionate change in the value of the dependent variable across ordered categories
A quadratic trend is where there is a curve in the line
A cubic trend is where there are two changes in the direction of the trend.
quartic trend, and this trend has three changes of direction