More on this book
Kindle Notes & Highlights
by
Andy Field
Read between
January 9 - February 23, 2021
In a scientific context, parsimony refers to the idea that simpler explanations of a phenomenon are preferable to complex ones.
Linearity: In the linear model we assume that the outcome has linear relationships with the predictors.
Independence of errors: In logistic regression, violating this assumption produces overdispersion,
Usually this is revealed by implausibly large standard errors. Two situations can provoke this situation, both of which are related to the ratio of cases to variables: incomplete information and complete separation.
Conscientious researchers produce and check multi-way crosstabulations of all categorical independent variables.
second situation in which logistic regression collapses might surprise you: it’s when the outcome variable can be perfectly predicted by one variable or a combination of variables. This situation is known as complete separation.
Complete separation often arises when too many variables are fitted to too few cases.
Logistic regression is not only used to predict a two category outcome (coded 0 and 1), it can also be used to, for example, predict proportions or outcomes with several categories
Wald statistic indicates that having the intervention (or not) is a significant predictor of whether the patient was cured because the p-value is 0.002, which is less than the conventional threshold of 0.05.
If the model perfectly fits the data, then this histogram should show all the cases for which the event has occurred on the right-hand side, and all the cases for which the event hasn’t occurred on the left-hand side.
If the predictor is a continuous variable, the cases will be spread across many columns.
the more the cases cluster at each end of the graph, the better; such a plot would show that when the outcome did occur (i.e., the patient was cured) the predicted probability of the event occurring is also high (i.e., close to 1).
This situation represents a model that correctly predicts the observed outcome data.
a good model will ensure that few cases are misclassified;
Fitting a model without checking how well it fits the data is like buying a new pair of trousers without trying them on:
a model does its job regardless of the data, but the real-life value of the model may be limited. So, our conclusions so far are fine in themselves, but to be sure that the model is a good one, it is important to examine the residuals.
As a bare minimum, report the b-values (and their standard errors and significance value), the odds ratio (and its confidence interval) and some general statistics about the model (such as the R2 and goodness-of-fit statistics).
SPSS does not produce collinearity diagnostics in logistic regression (which creates the illusion that multicollinearity doesn’t matter).
If you want to predict membership of more than two categories, the logistic regression model extends to multinomial logistic regression.
The model breaks the outcome variable into a series of comparisons between two categories
Pseudo R-square: This option produces the Cox and Snell and Nagelkerke R2 statistics, which can be used as effect sizes.
Step summary: This option produces a table that summarizes the predictors entered or removed at each step.
Model fitting information: This option produces a table that compares the model (or models in a stepwise analysis) to the baseline (the model with only the intercept in it). This table can be useful to compare whether the model has improved (fro...
This highlight has been truncated due to consecutive passage length restrictions.
Information Criteria: This option produces Akaike’s information criterion (AIC) and Schwarz’s Bayesian information criterion (BIC), whic...
This highlight has been truncated due to consecutive passage length restrictions.
Cell probabilities: This option produces a table of the observed and expected frequencies, which is basically the same as the classification table produced in binary logistic regression and is probably worth inspecting.
Classification table: This option produces a contingency table of observed versus predicted responses for all combinations of predictor variables.
Goodness-of-fit: This option is important because it produces Pearson and likelihood ratio chi-square statistics for the model.
Monotonicity measures: This option is worth selecting only if your outcome variable has two outcomes
Estimates: This option produces the b-values, test statistics and confidence intervals for predictors in the model and is essential.
Likelihood ratio tests: The model overall is tested using likelihood ratio statistics, but this option will compute the same test for individual effects in the model.
Asymptotic correlations and Asymptotic covariances: This option produces a table of correlations (or covariances)...
This highlight has been truncated due to consecutive passage length restrictions.
Logistic regression works through an iterative process
Remember that the log-likelihood is a measure of how much unexplained variability there is in the outcome and the change in the log-likelihood indicates how much new variance has been explained by a model relative to an earlier model.