Discovering Statistics Using IBM SPSS Statistics: North American Edition
Rate it:
24%
Flag icon
If you have a lot of scores Q-Q plots can be easier to interpret than P-P plots because they display fewer values.
25%
Flag icon
If you must use the K-S test, its statistic is denoted by D and you should report the degrees of freedom (df) in brackets after the D.
25%
Flag icon
When predictor variables are formed of categories, if you decide that you need to check the assumption of normality then you need to do it within each group separately
25%
Flag icon
The reason for looking at the assumption of linearity and homoscedasticity together is that we can check both with a single graph. Both assumptions relate to the errors (a.k.a. residuals) in the model and we can plot the values of these residuals against the corresponding values of the outcome predicted by our model in a scatterplot. The resulting plot shows whether there is a systematic relationship between what comes out of the model (the predicted values) and the errors in the model.
25%
Flag icon
When these assumptions have been violated you won’t see these exact patterns, but hopefully these plots will help you to understand the general anomalies to look out for.
25%
Flag icon
homoscedasticity/homogeneity of variance means that as you go through levels of one variable, the variance of the other should not change.
25%
Flag icon
SPSS produces something called Levene’s test (Levene, 1960), which tests the null hypothesis that the variances in different groups are equal.
25%
Flag icon
Some people also look at Hartley’s Fmax also known as the variance ratio (Pearson & Hartley, 1954). This is the ratio of the variances between the group with the biggest variance and the group with the smallest. This ratio was compared to critical values in a table published by Hartley.
25%
Flag icon
Levene’s test, which can be based on differences between scores and the mean or between scores and the median. The median is slightly preferable (because it is less biased by outliers).
25%
Flag icon
The variances are practically equal. So, why does Levene’s test tell us they are significantly different? The answer is because the sample sizes are so large: we had 315 males and 495 females, so even this very small difference in variances is shown up as significant by Levene’s test
25%
Flag icon
Levene’s test can be reported in this general form: F(df1, df2) = test statistic, p = p-value.
25%
Flag icon
Trim the data: Delete a certain quantity of scores from the extremes. Winsorizing: Substitute outliers with the highest value that isn’t an outlier. Apply a robust estimation method: A common approach is to use bootstrapping. Transform the data: Apply a mathematical function to scores to correct problems.
25%
Flag icon
Probably the best of these choices is to use robust tests, which is a term applied to a family of procedures to estimate statistics that are unbiased even when the normal assumptions of the statistic are not met
25%
Flag icon
Trimming the data means deleting some scores from the extremes.
25%
Flag icon
trimming involves removing extreme scores using one of two rules: (1) a percentage based rule; and (2) a standard deviation based rule.
25%
Flag icon
If you take trimming to its extreme then you get the median, which is the value left when you have trimmed all but the middle score.
25%
Flag icon
we calculate the mean in a sample that has been trimmed in this way, it is called (unsurprisingly) a trimmed mean.
25%
Flag icon
rather than the researcher deciding before the analysis how much of the data to trim, an M-estimator determines the optimal amount of trimming necessary to give a robust estimate of, say, the mean.
26%
Flag icon
Standard deviation based trimming involves calculating the mean and standard deviation of a set of scores, and then removing values that are a certain number of standard deviations greater than the mean.
26%
Flag icon
Winsorizing the data involves replacing outliers with the next highest score that is not an outlier.
26%
Flag icon
best option if you have irksome data (other than sticking a big samurai sword through your head) is to estimate parameters and their standard errors with methods that are robust to violations of assumptions and outliers.
26%
Flag icon
use methods that are relatively unaffected by irksome data.
26%
Flag icon
The first we have already looked at: parameter estimates based on trimmed data such as the trimmed mean and M-estimators. The second is the bootstrap
26%
Flag icon
Bootstrapping gets around this problem by estimating the properties of the sampling distribution from the sample data.
26%
Flag icon
The idea behind transformations is that you do something to every score to correct for distributional problems, outliers, lack of linearity or unequal variances.
26%
Flag icon
If you do decide to transform scores, use the compute command, which enables you to create new variables.
26%
Flag icon
Create new variables from existing variables:
26%
Flag icon
Create new variables from functions:
31%
Flag icon
Our starting point with a correlation analysis is, therefore, to look at scatterplots of the variables we have measured.
31%
Flag icon
Remember that the variance of a single variable represents the average amount that the data vary from the mean. Numerically, it is described by: (8.4)
31%
Flag icon
when one variable deviates from its mean we would expect the other variable to deviate from its mean in a similar way.
31%
Flag icon
When we multiply the deviations of one variable by the corresponding deviations of a second variable, we get the cross-product deviations.
31%
Flag icon
31%
Flag icon
positive covariance indicates that as one variable deviates from the mean, the other variable deviates in the same direction.
31%
Flag icon
a negative covariance indicates that as one variable deviates from the mean (e.g., increases), the other deviates from the mean in the opposite direction (e.g., decreases).
31%
Flag icon
the covariance depends upon the scales of measurement used: it is not a...
This highlight has been truncated due to consecutive passage length restrictions.
31%
Flag icon
To overcome the problem of dependence on the measurement scale, we need to convert the covariance into a standard set of units. This process is known as standardization.
31%
Flag icon
standardized covariance is known as a correlation coefficient and is defined as follows: (8.7) in which sx is the standard deviation of the first variable and sy is the standard deviation of the second variable (all other letters are the same as in the equation defining covariance). This coefficient, the Pearson product-moment correlation coefficient or Pearson’s correlation coefficient, r, was invented by Karl Pearson with Florence Nightingale David2
31%
Flag icon
We have just described a bivariate correlation, which is a correlation between two variables.
31%
Flag icon
The first is to use the trusty z-scores that keep cropping up in this book.
31%
Flag icon
z-scores are useful because we know the probability of a given value of z occurring, if the distribution from which it comes is normal.
31%
Flag icon
one problem with Pearson’s r, which is that it is known to have a sampling distribution that ...
This highlight has been truncated due to consecutive passage length restrictions.
31%
Flag icon
we can adjust r so that its sampling distribution is normal: (8.8) The resulting zr has a sta...
This highlight has been truncated due to consecutive passage length restrictions.
31%
Flag icon
32%
Flag icon
the hypothesis that the correlation coefficient is different from 0 is usually (SPSS, for example, does this) tested not using a z-score, but using a different test statistic called a t-statistic with N − 2 degrees of freedom. This statistic can be obtained directly from r: (8.11)
32%
Flag icon
a 95% confidence interval is calculated (see Eq. 2.15) as: (8.12)
32%
Flag icon
our transformed correlation coefficients, these equations become: (8.13)
32%
Flag icon
we can convert back to a correlation coefficient using: (8.14)
32%
Flag icon
Ranking the data reduces the impact of outliers. Furthermore, given that normality matters only for inferring significance and computing confidence intervals, we could use a bootstrap to compute the confidence interval, then we don’t need to worry about the distribution.
32%
Flag icon
because the confidence intervals are derived empirically using a random sampling procedure (i.e., bootstrapping) the results will be slightly different each time you run the analysis.
1 7 14