Discovering Statistics Using IBM SPSS Statistics: North American Edition
Rate it:
35%
Flag icon
The adjusted predicted value for a case is the predicted value of the outcome for that case from a model in which the case is excluded.
35%
Flag icon
if the model is stable then the predicted value of a case should be the same regardless of whether that case was used to estimate the model.
35%
Flag icon
the deleted residual, which is the difference between the adjusted predicted value and the original observed value. The deleted residual can be divided by the standard error to give a standardized value known as the studentized deleted residual. This residual can be compared across different regression analyzes because it is measured in standard units.
35%
Flag icon
Cook’s distance is a measure of the overall influence of a case on the model, and Cook and Weisberg (1982) have suggested that values greater than 1 may be cause for concern.
35%
Flag icon
The leverage (sometimes called hat values) gauges the influence of the observed value of the outcome variable over the predicted values. The average leverage value is defined as (k + 1)/n, in which k is the number of predictors in the model and n is the number of cases.
36%
Flag icon
cases with large leverage values will not necessarily have a large influence on the regression coefficients because they are measured on the outcome variables, not the predictors.
36%
Flag icon
Mahalanobis distances, which measure the distance of cases from the mean(s) of the predictor variable(s).
36%
Flag icon
final measure is the covariance ratio (CVR), which quantifies the degree to which a case influences the variance of the regression parameters.
36%
Flag icon
diagnostics are tools to see how well your model fits the sampled data and not a way of justifying the removal of data points to effect some desirable change in the regression parameters (e.g., deleting a case that changes a non-significant b-value into a significant one).
36%
Flag icon
Additivity and linearity: The outcome variable should, in reality, be linearly related to any predictors, and, with several predictors, their combined effect is best described by adding their effects together.
36%
Flag icon
Independent errors: For any two observations the residual terms should be uncorrelated (i.e., independent).
36%
Flag icon
described as a lack of autocorrelation.
36%
Flag icon
This assumption can be tested with the Durbin–Watson test, which tests for serial correlations between errors. Specifically, it tests whether adjacent residuals are correlated.
36%
Flag icon
Homoscedasticity (see Section 6.7): At each level of the predictor variable(s), the variance of the residual terms should be constant. This assumption means that the residuals at each level of the predictor(s) should have the same variance (homoscedasticity); when the variances are very unequal there is said to be heteroscedasticity.
36%
Flag icon
Normally distributed errors (see Section 6.6): It can be helpful if the residuals in the model are random, normally distributed variables with a mean of 0.
36%
Flag icon
Predictors are uncorrelated with ‘external variables’: External variables are variables that haven’t been included in the model and that influence the outcome variable.8 These variables are like the ‘third variable’ that we discussed in the correlation chapter.
36%
Flag icon
Variable types: All predictor variables must be quantitative or categorical (with two categories), and the outcome variable must be quantitative, continuous and unbounded.
36%
Flag icon
No perfect multicollinearity: If your model has more than one predictor then there should be no perfect linear relationship between two or more of the predictors.
36%
Flag icon
Non-zero variance: The predictors should have some variation in value (i.e., they should not have variances of 0).
36%
Flag icon
if confidence intervals are inaccurate (as they are when these assumptions are broken) we cannot accurately estimate the likely population value.
36%
Flag icon
we can’t generalize our model to the population. When the assumptions are met then on average the regression model from the sample is the same as the population model.
36%
Flag icon
Assessing the accuracy of a model across different samples is known as cross-validation.
36%
Flag icon
we should collect enough data to obtain a reliable model
36%
Flag icon
Adjusted R2: Whereas R2 tells us how much of the variance in Y overlaps with predicted values from the model in our sample, adjusted R2 tells us how much variance in Y would be accounted for if the model had been derived from the population from which the sample was taken.
36%
Flag icon
Stein’s formula, (9.15) does tell us how well the model cross-validates
36%
Flag icon
Data splitting: This approach involves randomly splitting your sample data, estimating the model in both halves of the data and comparing the resulting models.
36%
Flag icon
The sample size required depends on the size of effect that we’re trying to detect (i.e., how strong the relationship is that we’re trying to measure) and how much power we want to detect these effects.
36%
Flag icon
we should produce scatterplots to get some idea of whether the assumption of linearity is met, and to look for outliers or obvious unusual cases.
36%
Flag icon
we want to generalize our model beyond the sample or we are interested in interpreting significance tests and confidence intervals, then we examine these residuals to check for homoscedasticity, normality, independence and linearity (although this will likely be fine, given our earlier screening).
36%
Flag icon
37%
Flag icon
The column Sig. contains the exact probability that a value of t at least as big as the one in the table would occur if the value of b in the population were zero. If this probability is less than 0.05, then people interpret that as the predictor being a ‘significant’ predictor of the outcome
37%
Flag icon
When we build a model with several predictors, everything we have discussed so far applies. However, there are some additional things to think about. The first is what variables to enter into the model.
37%
Flag icon
Do not enter hundreds of predictors, just because you’ve measured them, and expect the resulting model to make sense.
37%
Flag icon
Select predictors based on a sound theoretical rationale or well-conducted past research that has demonstrated their importance.
37%
Flag icon
If predictors are being added that have never been looked at before (in your research context) then select these variables based on their substantive theoretical importance. The key point is that the most important thing when building a model is to use your brain
37%
Flag icon
When predictors are completely uncorrelated the order of variable entry has very little effect on the parameters estimated; however, we rarely have uncorrelated predictors, and so the method of variable entry has consequences and is, therefore, important.
37%
Flag icon
Other things being equal, use hierarchical regression, in which you select predictors based on past work and decide in which order to enter them into the model. Generally speaking, you should enter known predictors (from other research) into the model first in order of their importance in predicting the outcome.
37%
Flag icon
An alternative is forced entry (or Enter as it is known in SPSS), in which you force all predictors into the model simultaneously. Like hierarchical, this method relies on good theoretical reasons for including the chosen predictors, but unlike hierarchical, you make no decision about the order in which variables are entered.
37%
Flag icon
stepwise regression, is generally frowned upon by statisticians. Nevertheless, SPSS Statistics makes it easy to do and actively encourages it in the Automatic Linear Modeling process
37%
Flag icon
The stepwise method bases decisions about the order in which predictors enter the model on a purely mathematical criterion.
37%
Flag icon
The stepwise method in SPSS Statistics is the same as the forward method, except that each time a predictor is added to the equation, a removal test is made of the least useful predictor.
37%
Flag icon
models built using stepwise methods are less likely to generalize across samples because the selection of variables in the model is affected by the sampling process.
37%
Flag icon
because the criterion for retaining variables is based on statistical significance, your sample size affects the model you get: in large samples significance tests are highly powered, resulting in predictors being retained that make trivial contributions to predicting the outcome, and in small samples where power is low, predictors that make a large contribution may get overlooked.
37%
Flag icon
suppressor effects, which occur when a predictor has a significant effect only when another variable is held constant.
37%
Flag icon
Hierarchical and (although obviously you’d never use them) stepwise methods involve adding predictors to the model in stages, and it is useful to assess the improvement to the model at each stage.
37%
Flag icon
37%
Flag icon
Akaike information criterion (AIC)12 is a measure of fit that penalizes the model for having more variables.
37%
Flag icon
Perfect collinearity exists when at least one predictor is a perfect linear combination of the others (the simplest example being two predictors that are perfectly correlated – they have a correlation coefficient of 1).
37%
Flag icon
variance inflation factor (VIF), which indicates whether a predictor has a strong linear relationship with the other predictor(s), and the tolerance statistic, which is its reciprocal (1/VIF).
37%
Flag icon
the largest VIF is greater than 10 (or the tolerance is below 0.1) then this indicates a serious problem (Bowerman & O’Connell, 1990; Myers, 1990). If the average VIF is substantially greater than 1 then the regression may be biased (Bowerman & O’Connell, 1990). Tolerance below 0.2 indicates a potential problem (Menard, 1995).
1 9 14