More on this book
Kindle Notes & Highlights
by
Andy Field
Read between
January 9 - February 23, 2021
think about what a correlation of zero represents: it is no effect whatsoever. A confidence interval is the boundary between which the population value falls in 95% of samples.
The correlation coefficient squared (known as the coefficient of determination, R2) is a measure of the amount of variability in one variable that is shared by the other.
You’ll often see people write things about R2 that imply causality: they might write ‘the variance in y accounted for by x’, or ‘the variation in one variable explained by the other’.
Although R2 is a useful measure of the substantive importance of an effect, it cannot be used to infer causal relationships.
Spearman’s correlation coefficient, denoted by rs (Figure 8.9), is a non-parametric statistic that is useful to minimize the effects of extreme scores or the effects of violations of the assumptions
Remember that these confidence intervals are based on a random sampling procedure so the values you get will differ slightly from mine, and will change if you rerun the analysis.
Kendall’s tau, denoted by τ, is another non-parametric correlation and it should be used rather than Spearman’s coefficient when you have a small data set with a large number of tied ranks.
Although Spearman’s statistic is the more popular of the two coefficients, there is much to suggest that Kendall’s statistic is a better estimate of the correlation in the population
Often it is necessary to investigate relationships between two variables when one of the variables is dichotomous (i.e., it is categorical with only two categories).
different. The difference between the use of biserial and point-biserial correlations depends on whether the dichotomous variable is discrete or continuous.
discrete or true, dichotomy is one for which there is no underlying continuum between the categories.
The point-biserial correlation coefficient (rpb) is used when one variable is a discrete dichotomy (e.g., pregnancy), whereas the biserial correlation coefficient (rb) is used when one variable is a continuous dichotomy (e.g., passing or failing an exam).
Another way to express the unique relationship between two variables (i.e., the relationship accounting for other variables) is the partial correlation.
the semi-partial correlation expresses the unique relationship between two variables, X and Y, as a function of the total variance in Y.
Partial correlations can be done when variables are dichotomous (including the ‘third’ variable).
We can calculate a z-score of the differences between these correlations using: (8.18)
If you want to compare correlation coefficients that come from the same entities then things are a little more complicated. You can use a t-statistic to test whether a difference between two dependent correlations is significant.
The t-statistic is computed as follows (Chen & Popovich, 2002): (8.20)
This value can be checked against the appropriate critical value for t with N − 3 degrees of freedom
A table is a good way to report lots of correlations.
This equation keeps the fundamental idea that an outcome for a person can be predicted from a model (the stuff in parentheses) and some error associated with that prediction (εi). We still predict an outcome variable (Yi) from a predictor variable (Xi) and a parameter, b1, associated with the predictor variable that quantifies the relationship it has with the outcome variable.
Any straight line can be defined by two things: (1) the slope (or gradient) of the line (usually denoted by b1); and (2) the point at which the line crosses the vertical axis of the graph (known as the intercept of the line, b0).
A model with a positive b1 describes a positive relationship, whereas a line with a negative b1 describes a negative relationship.
regression analysis is a term for fitting a linear model to data and using it to predict values of an outcome variable (a.k.a. dependent variable) from one or more predictor variables (a.k.a. independent variables). With one predictor variable, the technique is sometimes referred to as simple regression, but with several predictors it is called multiple regression. Both are merely terms for the linear model.
predictions). If the model is a perfect fit to the data then for a given value of the predictor(s) the model will predict the same value of the outcome as was observed.
it overestimates the observed value of the outcome and sometimes it underestimates it. With the linear model the differences between what the model predicts and the observed data are usually called residuals (they are the same as deviations when we looked at the mean);
square them before we add them up (this idea should be familiar from Section 2.5.2). Therefore, to assess the error in a linear model, just like when we assessed the fit of the mean using the variance, we use a sum of squared errors, and because we call these errors residuals, this total is called the sum of squared residuals or residual sum of squares (SSR). The residual sum of squares is a gauge of how well a linear model fits the data: if the squared differences are large, the model is not representative of the data (there is a lot of error in prediction); if the squared differences are
...more
when we estimate the mean, we use the method of least squares to estimate the parameters (b) that define the regression model for which the sum of squared errors is the minimum it can be (given the data). This method is known as ordinary least squares (OLS) regression.
The mean of the outcome is a model of ‘no relationship’ between the variables: as one variable changes the prediction for the other remains constant
This sum of squared differences is known as the total sum of squares (denoted by SST) and it represents how good the mean is as a model of the observed outcome scores
We can use the values of SST and SSR to calculate how much better the linear model is than the baseline model of ‘no relationship’.
The improvement in prediction resulting from using the linear model rather than the mean is calculated as the difference between SST and SSR (Figure 9.5, bottom). This difference shows us the reduction in the inaccuracy of the model resulting from fitting the regression model to the data. This improvement is the model sum of squares (SSM).
If the value of SSM is large, the linear model is very different from using the mean to predict the outcome variable. This implies that the linear model has made a big improvement to predicting the outcome variable.
SSM is small then using the linear model is little better than using the mean (i.e., the best model is no better than predicting from ‘no relationship’).
variance or, put another way, the model compared to the error in the model. This is true here: F is based upon the ratio of the improvement due to the model (SSM) and the error in the model (SSR).
For SSM the degrees of freedom are the number of predictors in the model (k), and for SSR they are the number of observations (N) minus the number of parameters being estimated (i.e., the number of b coefficients including the constant). We estimate a b for each predictor and the intercept (b0), so the total number of bs estimated will be k + 1, giving us degrees of freedom of N - (k + 1) or, more simply, N - k - 1. Thus (9.11)
is a measure of how much the model has improved the prediction of the outcome compared to the level of inaccuracy of the model.
The F-statistic is also used to calculate the significance of R2 using the following equation: (9.13) in which N is the number of cases or participants, and k is the number of predictors in the model.
This hypothesis is tested using a t-statistic that tests the null hypothesis that the value of b is 0. If the test is significant, we might interpret this information as supporting a hypothesis that the b-value is significantly different from 0 and that the predictor variable contributes significantly to our ability to estimate values of the outcome.
the t-statistic is based on the ratio of explained variance against unexplained variance or error.
If the standard error is very small, then most samples are likely to have a b-value similar to the one in our sample (because there is little variation across samples).
how the t-test is calculated: (9.14)
The statistic t has a probability distribution that differs according to the degrees of freedom for the test. In this context, the degrees of freedom are N − k − 1, where N is the total sample size and k is the number of predictors.
Using the appropriate t-distribution, it’s possible to calculate a p-value that indicates the probability of getting a t at least as large as the one we observed if the null hypothesis were true
Generalization (Section 9.4) is a critical additional step, and if we find that our model is not generalizable, then we must restrict any conclusions to the sample used.
we can use standardized residuals, which are the residuals converted to z-scores (see Section 1.8.6) and so are expressed in standard deviation units.
(1) standardized residuals with an absolute value greater than 3.29 (we can use 3 as an approximation) are cause for concern because in an average sample a value this high is unlikely to occur; (2) if more than 1% of our sample cases have standardized residuals with an absolute value greater than 2.58 (2.5 will do) there is evidence that the level of error within our model may be unacceptable; and (3) if more than 5% of cases have standardized residuals with an absolute value greater than 1.96 (2 for convenience) then the model may be a poor representation of the data.
third form of residual is the studentized residual, which is the unstandardized residual divided by an estimate of its standard deviation that varies point by point.
It is also possible to look at whether certain cases exert undue influence over the parameters of the model.