More on this book
Kindle Notes & Highlights
by
Andy Field
Read between
January 9 - February 23, 2021
We have discovered that when it comes to graphs, minimal is best: no pink, no 3-D effects, no pictures of Errol your pet ferret superimposed on the data
Graphs are a useful way to visualize life.
bias means that the summary information from the person (‘Jasmin’s singing was note perfect throughout’) is at odds with the objective truth
in statistics the summary statistics that we estimate can be at odds with the true values.
‘unbiased estimator’ is one that yields an expected value that is the same as the thing...
This highlight has been truncated due to consecutive passage length restrictions.
we predict an outcome variable from a model described by one or more predictor variables (the Xs in the equation) and parameters (the bs in the equation) that tell us about the relationship between the predictor and the outcome variable. The model will not predict the outcome perfectly, so for each observation there is some amount of error.
We often obtain values for the parameters in the model using the method of least squares
each parameter in the model we also compute an estimate of how well it represents the population such as a standard error (Section 2.7) or confidence interval
Statistical bias enters the process I’ve just summarized in (broadly) three ways: things that bias the parameter estimates (including effect sizes); things that bias standard errors and confidence intervals; things that bias test statistics and p-values.
confidence intervals and test statistics are computed using the standard error, so if the standard error is biased then the corresponding confidence interval and test statistic (and associated p-value) will be biased too.
outlier is a score very different from the rest of the data.
Outliers bias parameter estimates, but they have an even greater impact on the error associated with that estimate.
By comparing the horizontal to vertical shift in the curve you should see that the outlier affects the sum of squared error more dramatically than the parameter estimate itself.
An assumption is a condition that ensures that what you’re attempting to do works.
when we assess a model using a test statistic, we have usually made some assumptions, and if these assumptions are true then we know that we can take the test statistic (and associated p-value) at face value and interpret it accordingly.
if any of the assumptions are not true (usually referred to as a violation) then the test statistic and p-value will be inaccurate and ...
This highlight has been truncated due to consecutive passage length restrictions.
These assumptions relate to the quality of the model itself, and the test statistics used to assess it (which are usually parametric tests based on the normal distribution).
assumptions that we’ll look at are: additivity and linearity; normality of something or other; homoscedasticity/homogeneity of variance; independence.
The vast majority of statistical models in this book are based on the linear model,
assumption of additivity and linearity means that the relationship between the outcome variable and predictors is accurately described by equation (2.4).
assumption is the most important because if it is not true then, even if all other assumptions are met, your model is invalid because your description of the process you want to model is wrong.
if you describe your statistical model inaccurately it won’t behave itself and there’s no point in interpreting its parameter estimates or worrying about significance tests of confidence intervals: the model is wrong.
The second assumption relates to the normal distribution,
Many people wrongly take the ‘assumption of normality’ to mean that the data need to be normally distributed
Parameter estimates: The mean is a parameter, and we saw in Section 6.3 (the Amazon ratings) that extreme scores can bias it.
Confidence intervals: We use values of the standard normal distribution to compute the confidence interval (Section 2.8.1) around a parameter estimate (e.g., the mean or a b in equation (2.4)).
Null hypothesis significance testing: If we want to test a hypothesis about a model (and, therefore, the parameter estimates within it) using the framework described in Section 2.9 then we assume that the parameter estimates have a normal distribution.
This is the central limit theorem: regardless of the shape of the population, parameters estimates of that population will have a normal distribution provided the samples are ‘big enough’
For confidence intervals around a parameter estimate (e.g., the mean or a b in equation (2.4)) to be accurate, that estimate must come from a normal sampling distribution. The central limit theorem tells us that in large samples, the estimate will have come from a normal distribution regardless of what the sample or population data look like.
For significance tests of models to be accurate the sampling distribution of what’s being tested must be normal. Again, the central limit theorem tells us that in large samples this will be true no matter what the shape of the population.
the method of least squares) the residuals in the population must be normally distributed. The method of least squares will always give you an estimate of the model parameters that minimizes error, so in that sense you don’t need to assume normality of anything to fit a linear model and estimate the parameters that define it
if all you want to do is estimate the parameters of your model then normality matters mainly in deciding how best to estimate them.
want to construct confidence intervals around those parameters or compute significance tests relating to those parameters, then the assumption of normality matters in small samples, but because of the central limit theorem we don’t really need to worry about this assumption in larger samples
The second assumption relates to variance (Section 1.8.5) and is called homoscedasticity (also known as homogeneity of variance). It impacts two things: Parameters: Using the method of least squares (Section 2.6) to estimate the parameters in the model, we get optimal estimates if the variance of the outcome variable is equal across different values of the predictor variable. Null hypothesis significance testing: Test statistics often assume that the variance of the outcome variable is equal across different values of the predictor variable. If this is not the case then these test statistics
...more
as you go through levels of the predictor variable, the variance of the outcome variable should not change.
bars are of similar lengths, indicating that the spread of scores around the mean was roughly the same at each concert. This is homogeneity of variance or homoscedasticity:6 the spread of scores for hearing damage is the same at each level of the concert variable
The uneven spread of scores is easiest to see if we look at the bars in the lower right-hand graph. This scenario illustrates heterogeneity of variance or heteroscedasticity: at some levels of the concert variable the variance of scores is different than that at other levels (graphically, the vertical distance from the lowest to highest score is different after different concerts).
The method of least squares will produce ‘unbiased’ estimates of parameters even when homogeneity of variance can’t be assumed, but they won’t be optimal.
unequal variances/heteroscedasticity creates a bias and inconsistency in the estimate of the standard error associated with the parameter estimates in your model
error. Confidence intervals can be ‘extremely inaccurate’ when homogeneity of variance/homoscedasticity cannot be assumed
The equation that we use to estimate the standard error (equation (2.14)) is valid only if observations are independent.
When they are isolated, extreme cases and outliers are fairly easy to spot using graphs such as histograms and boxplots; it is considerably trickier when outliers are more subtle
Frequency distributions are not only good for spotting outliers, they are the natural choice for looking at the shape of the distribution,
An alternative is the P-P plot (probability–probability plot), which plots the cumulative probability of a variable against the cumulative probability of a particular distribution (in this case we would specify a normal distribution).
Positive values of skewness indicate a pile-up of scores on the left of the distribution, whereas negative values indicate a pile-up on the right.
In larger samples you should certainly not do them; instead, look at the shape of the distribution visually, interpret the value of the skewness and kurtosis statistics, and possibly don’t even worry about normality at all.
The Kolmogorov–Smirnov test and Shapiro–Wilk test compare the scores in the sample to a normally distributed set of scores with the same mean and standard deviation. If the test is non-significant (p > 0.05) it tells us that the distribution of the sample is not significantly different from a normal distribution (i.e., it is probably normal). If, however, the test is significant (p < 0.05) then the distribution in question is significantly different from a normal distribution (i.e., it is non-normal).
A Q-Q plot is like the P-P plot that we encountered in Section 6.10, except that it plots the quantiles (Section 1.8.5) of the data instead of every individual score.
The Q-Q plot can be interpreted in the same way as a P-P plot: kurtosis is shown up by the dots sagging above or below the line, whereas skew is shown up by the dots snaking around the line in an ‘S’ shape.