More on this book
Kindle Notes & Highlights
by
Andy Field
Read between
January 9 - February 23, 2021
The adjusted predicted value for a case is the predicted value of the outcome for that case from a model in which the case is excluded.
if the model is stable then the predicted value of a case should be the same regardless of whether that case was used to estimate the model.
the deleted residual, which is the difference between the adjusted predicted value and the original observed value. The deleted residual can be divided by the standard error to give a standardized value known as the studentized deleted residual. This residual can be compared across different regression analyzes because it is measured in standard units.
Cook’s distance is a measure of the overall influence of a case on the model, and Cook and Weisberg (1982) have suggested that values greater than 1 may be cause for concern.
The leverage (sometimes called hat values) gauges the influence of the observed value of the outcome variable over the predicted values. The average leverage value is defined as (k + 1)/n, in which k is the number of predictors in the model and n is the number of cases.
cases with large leverage values will not necessarily have a large influence on the regression coefficients because they are measured on the outcome variables, not the predictors.
Mahalanobis distances, which measure the distance of cases from the mean(s) of the predictor variable(s).
final measure is the covariance ratio (CVR), which quantifies the degree to which a case influences the variance of the regression parameters.
diagnostics are tools to see how well your model fits the sampled data and not a way of justifying the removal of data points to effect some desirable change in the regression parameters (e.g., deleting a case that changes a non-significant b-value into a significant one).
Additivity and linearity: The outcome variable should, in reality, be linearly related to any predictors, and, with several predictors, their combined effect is best described by adding their effects together.
Independent errors: For any two observations the residual terms should be uncorrelated (i.e., independent).
described as a lack of autocorrelation.
This assumption can be tested with the Durbin–Watson test, which tests for serial correlations between errors. Specifically, it tests whether adjacent residuals are correlated.
Homoscedasticity (see Section 6.7): At each level of the predictor variable(s), the variance of the residual terms should be constant. This assumption means that the residuals at each level of the predictor(s) should have the same variance (homoscedasticity); when the variances are very unequal there is said to be heteroscedasticity.
Normally distributed errors (see Section 6.6): It can be helpful if the residuals in the model are random, normally distributed variables with a mean of 0.
Predictors are uncorrelated with ‘external variables’: External variables are variables that haven’t been included in the model and that influence the outcome variable.8 These variables are like the ‘third variable’ that we discussed in the correlation chapter.
Variable types: All predictor variables must be quantitative or categorical (with two categories), and the outcome variable must be quantitative, continuous and unbounded.
No perfect multicollinearity: If your model has more than one predictor then there should be no perfect linear relationship between two or more of the predictors.
Non-zero variance: The predictors should have some variation in value (i.e., they should not have variances of 0).
if confidence intervals are inaccurate (as they are when these assumptions are broken) we cannot accurately estimate the likely population value.
we can’t generalize our model to the population. When the assumptions are met then on average the regression model from the sample is the same as the population model.
Assessing the accuracy of a model across different samples is known as cross-validation.
we should collect enough data to obtain a reliable model
Adjusted R2: Whereas R2 tells us how much of the variance in Y overlaps with predicted values from the model in our sample, adjusted R2 tells us how much variance in Y would be accounted for if the model had been derived from the population from which the sample was taken.
Stein’s formula, (9.15) does tell us how well the model cross-validates
Data splitting: This approach involves randomly splitting your sample data, estimating the model in both halves of the data and comparing the resulting models.
The sample size required depends on the size of effect that we’re trying to detect (i.e., how strong the relationship is that we’re trying to measure) and how much power we want to detect these effects.
we should produce scatterplots to get some idea of whether the assumption of linearity is met, and to look for outliers or obvious unusual cases.
we want to generalize our model beyond the sample or we are interested in interpreting significance tests and confidence intervals, then we examine these residuals to check for homoscedasticity, normality, independence and linearity (although this will likely be fine, given our earlier screening).
The column Sig. contains the exact probability that a value of t at least as big as the one in the table would occur if the value of b in the population were zero. If this probability is less than 0.05, then people interpret that as the predictor being a ‘significant’ predictor of the outcome
When we build a model with several predictors, everything we have discussed so far applies. However, there are some additional things to think about. The first is what variables to enter into the model.
Do not enter hundreds of predictors, just because you’ve measured them, and expect the resulting model to make sense.
Select predictors based on a sound theoretical rationale or well-conducted past research that has demonstrated their importance.
If predictors are being added that have never been looked at before (in your research context) then select these variables based on their substantive theoretical importance. The key point is that the most important thing when building a model is to use your brain
When predictors are completely uncorrelated the order of variable entry has very little effect on the parameters estimated; however, we rarely have uncorrelated predictors, and so the method of variable entry has consequences and is, therefore, important.
Other things being equal, use hierarchical regression, in which you select predictors based on past work and decide in which order to enter them into the model. Generally speaking, you should enter known predictors (from other research) into the model first in order of their importance in predicting the outcome.
An alternative is forced entry (or Enter as it is known in SPSS), in which you force all predictors into the model simultaneously. Like hierarchical, this method relies on good theoretical reasons for including the chosen predictors, but unlike hierarchical, you make no decision about the order in which variables are entered.
stepwise regression, is generally frowned upon by statisticians. Nevertheless, SPSS Statistics makes it easy to do and actively encourages it in the Automatic Linear Modeling process
The stepwise method bases decisions about the order in which predictors enter the model on a purely mathematical criterion.
The stepwise method in SPSS Statistics is the same as the forward method, except that each time a predictor is added to the equation, a removal test is made of the least useful predictor.
models built using stepwise methods are less likely to generalize across samples because the selection of variables in the model is affected by the sampling process.
because the criterion for retaining variables is based on statistical significance, your sample size affects the model you get: in large samples significance tests are highly powered, resulting in predictors being retained that make trivial contributions to predicting the outcome, and in small samples where power is low, predictors that make a large contribution may get overlooked.
suppressor effects, which occur when a predictor has a significant effect only when another variable is held constant.
Hierarchical and (although obviously you’d never use them) stepwise methods involve adding predictors to the model in stages, and it is useful to assess the improvement to the model at each stage.
Akaike information criterion (AIC)12 is a measure of fit that penalizes the model for having more variables.
Perfect collinearity exists when at least one predictor is a perfect linear combination of the others (the simplest example being two predictors that are perfectly correlated – they have a correlation coefficient of 1).
variance inflation factor (VIF), which indicates whether a predictor has a strong linear relationship with the other predictor(s), and the tolerance statistic, which is its reciprocal (1/VIF).
the largest VIF is greater than 10 (or the tolerance is below 0.1) then this indicates a serious problem (Bowerman & O’Connell, 1990; Myers, 1990). If the average VIF is substantially greater than 1 then the regression may be biased (Bowerman & O’Connell, 1990). Tolerance below 0.2 indicates a potential problem (Menard, 1995).