Discovering Statistics Using IBM SPSS Statistics: North American Edition
Rate it:
22%
Flag icon
We have discovered that when it comes to graphs, minimal is best: no pink, no 3-D effects, no pictures of Errol your pet ferret superimposed on the data
22%
Flag icon
Graphs are a useful way to visualize life.
22%
Flag icon
bias means that the summary information from the person (‘Jasmin’s singing was note perfect throughout’) is at odds with the objective truth
22%
Flag icon
in statistics the summary statistics that we estimate can be at odds with the true values.
22%
Flag icon
‘unbiased estimator’ is one that yields an expected value that is the same as the thing...
This highlight has been truncated due to consecutive passage length restrictions.
23%
Flag icon
we predict an outcome variable from a model described by one or more predictor variables (the Xs in the equation) and parameters (the bs in the equation) that tell us about the relationship between the predictor and the outcome variable. The model will not predict the outcome perfectly, so for each observation there is some amount of error.
23%
Flag icon
We often obtain values for the parameters in the model using the method of least squares
23%
Flag icon
each parameter in the model we also compute an estimate of how well it represents the population such as a standard error (Section 2.7) or confidence interval
23%
Flag icon
Statistical bias enters the process I’ve just summarized in (broadly) three ways: things that bias the parameter estimates (including effect sizes); things that bias standard errors and confidence intervals; things that bias test statistics and p-values.
23%
Flag icon
confidence intervals and test statistics are computed using the standard error, so if the standard error is biased then the corresponding confidence interval and test statistic (and associated p-value) will be biased too.
23%
Flag icon
outlier is a score very different from the rest of the data.
23%
Flag icon
Outliers bias parameter estimates, but they have an even greater impact on the error associated with that estimate.
23%
Flag icon
By comparing the horizontal to vertical shift in the curve you should see that the outlier affects the sum of squared error more dramatically than the parameter estimate itself.
23%
Flag icon
An assumption is a condition that ensures that what you’re attempting to do works.
23%
Flag icon
when we assess a model using a test statistic, we have usually made some assumptions, and if these assumptions are true then we know that we can take the test statistic (and associated p-value) at face value and interpret it accordingly.
23%
Flag icon
if any of the assumptions are not true (usually referred to as a violation) then the test statistic and p-value will be inaccurate and ...
This highlight has been truncated due to consecutive passage length restrictions.
23%
Flag icon
These assumptions relate to the quality of the model itself, and the test statistics used to assess it (which are usually parametric tests based on the normal distribution).
23%
Flag icon
assumptions that we’ll look at are: additivity and linearity; normality of something or other; homoscedasticity/homogeneity of variance; independence.
23%
Flag icon
The vast majority of statistical models in this book are based on the linear model,
23%
Flag icon
assumption of additivity and linearity means that the relationship between the outcome variable and predictors is accurately described by equation (2.4).
23%
Flag icon
assumption is the most important because if it is not true then, even if all other assumptions are met, your model is invalid because your description of the process you want to model is wrong.
23%
Flag icon
if you describe your statistical model inaccurately it won’t behave itself and there’s no point in interpreting its parameter estimates or worrying about significance tests of confidence intervals: the model is wrong.
23%
Flag icon
The second assumption relates to the normal distribution,
23%
Flag icon
Many people wrongly take the ‘assumption of normality’ to mean that the data need to be normally distributed
23%
Flag icon
Parameter estimates: The mean is a parameter, and we saw in Section 6.3 (the Amazon ratings) that extreme scores can bias it.
23%
Flag icon
Confidence intervals: We use values of the standard normal distribution to compute the confidence interval (Section 2.8.1) around a parameter estimate (e.g., the mean or a b in equation (2.4)).
23%
Flag icon
Null hypothesis significance testing: If we want to test a hypothesis about a model (and, therefore, the parameter estimates within it) using the framework described in Section 2.9 then we assume that the parameter estimates have a normal distribution.
23%
Flag icon
This is the central limit theorem: regardless of the shape of the population, parameters estimates of that population will have a normal distribution provided the samples are ‘big enough’
23%
Flag icon
For confidence intervals around a parameter estimate (e.g., the mean or a b in equation (2.4)) to be accurate, that estimate must come from a normal sampling distribution. The central limit theorem tells us that in large samples, the estimate will have come from a normal distribution regardless of what the sample or population data look like.
23%
Flag icon
For significance tests of models to be accurate the sampling distribution of what’s being tested must be normal. Again, the central limit theorem tells us that in large samples this will be true no matter what the shape of the population.
23%
Flag icon
the method of least squares) the residuals in the population must be normally distributed. The method of least squares will always give you an estimate of the model parameters that minimizes error, so in that sense you don’t need to assume normality of anything to fit a linear model and estimate the parameters that define it
23%
Flag icon
if all you want to do is estimate the parameters of your model then normality matters mainly in deciding how best to estimate them.
23%
Flag icon
want to construct confidence intervals around those parameters or compute significance tests relating to those parameters, then the assumption of normality matters in small samples, but because of the central limit theorem we don’t really need to worry about this assumption in larger samples
23%
Flag icon
The second assumption relates to variance (Section 1.8.5) and is called homoscedasticity (also known as homogeneity of variance). It impacts two things: Parameters: Using the method of least squares (Section 2.6) to estimate the parameters in the model, we get optimal estimates if the variance of the outcome variable is equal across different values of the predictor variable. Null hypothesis significance testing: Test statistics often assume that the variance of the outcome variable is equal across different values of the predictor variable. If this is not the case then these test statistics ...more
23%
Flag icon
as you go through levels of the predictor variable, the variance of the outcome variable should not change.
23%
Flag icon
bars are of similar lengths, indicating that the spread of scores around the mean was roughly the same at each concert. This is homogeneity of variance or homoscedasticity:6 the spread of scores for hearing damage is the same at each level of the concert variable
24%
Flag icon
The uneven spread of scores is easiest to see if we look at the bars in the lower right-hand graph. This scenario illustrates heterogeneity of variance or heteroscedasticity: at some levels of the concert variable the variance of scores is different than that at other levels (graphically, the vertical distance from the lowest to highest score is different after different concerts).
24%
Flag icon
The method of least squares will produce ‘unbiased’ estimates of parameters even when homogeneity of variance can’t be assumed, but they won’t be optimal.
24%
Flag icon
unequal variances/heteroscedasticity creates a bias and inconsistency in the estimate of the standard error associated with the parameter estimates in your model
24%
Flag icon
error. Confidence intervals can be ‘extremely inaccurate’ when homogeneity of variance/homoscedasticity cannot be assumed
24%
Flag icon
The equation that we use to estimate the standard error (equation (2.14)) is valid only if observations are independent.
24%
Flag icon
When they are isolated, extreme cases and outliers are fairly easy to spot using graphs such as histograms and boxplots; it is considerably trickier when outliers are more subtle
24%
Flag icon
Frequency distributions are not only good for spotting outliers, they are the natural choice for looking at the shape of the distribution,
24%
Flag icon
An alternative is the P-P plot (probability–probability plot), which plots the cumulative probability of a variable against the cumulative probability of a particular distribution (in this case we would specify a normal distribution).
24%
Flag icon
Positive values of skewness indicate a pile-up of scores on the left of the distribution, whereas negative values indicate a pile-up on the right.
24%
Flag icon
24%
Flag icon
In larger samples you should certainly not do them; instead, look at the shape of the distribution visually, interpret the value of the skewness and kurtosis statistics, and possibly don’t even worry about normality at all.
24%
Flag icon
The Kolmogorov–Smirnov test and Shapiro–Wilk test compare the scores in the sample to a normally distributed set of scores with the same mean and standard deviation. If the test is non-significant (p > 0.05) it tells us that the distribution of the sample is not significantly different from a normal distribution (i.e., it is probably normal). If, however, the test is significant (p < 0.05) then the distribution in question is significantly different from a normal distribution (i.e., it is non-normal).
24%
Flag icon
A Q-Q plot is like the P-P plot that we encountered in Section 6.10, except that it plots the quantiles (Section 1.8.5) of the data instead of every individual score.
24%
Flag icon
The Q-Q plot can be interpreted in the same way as a P-P plot: kurtosis is shown up by the dots sagging above or below the line, whereas skew is shown up by the dots snaking around the line in an ‘S’ shape.
1 6 14