Kindle Notes & Highlights
by
Ron Klimberg
Read between
January 2 - March 20, 2018
is one of the dependence techniques in which the dependent variable is discrete and, more specifically, binary. That is, it takes on only two possible values.
Combining these two results, you have P[Yi = 1] = a + b * Xi. You can see that, in the case of a binary dependent variable, the regression might be interpreted as a probability.
If the estimated probability is high enough (for example, above .5), then you predict 1. Conversely, if the estimated probability of a 1 is low enough (for example, below .5), then you predict 0.
When linear regression is applied to a binary dependent variable, it is commonly called the linear probability model (LPM).
Three primary difficulties arise in the LPM. First, the predictions from a linear regression do not necessarily fall between zero and one.
Second, for any given predicted value of y (denoted ŷ), the residual (resid = y − ŷ) can take only two values.
A further implication of the fact that the residual can take on only two values for any ŷ is that the residuals are heteroscedastic. This violates the linear regression assumption of homoscedasticity (constant variance). The estimates of the standard errors of the regression coefficients will not be stable, and inference will be unreliable.
Third, the linearity assumption is likely to be invalid, especially at the extremes of the independent variable.
Another useful representation of the logistic function is the following: Recognize that the y-axis, G(z), is a probability, and let G(z) = π, the probability of the event’s occurring.
Consider taking the natural logarithm of both sides. The left side will become log[π/(1 − π)]. The log of the odds ratio is called the logit.
When a logistic regression has been run, simply clicking the red triangle and selecting Odds Ratios will do the trick.
Compared to a person who passes the midterm, a person who fails the midterm is 12% as likely to pass the class. Or equivalently, a person who fails the midterm is 88% less likely, (OR − 1) * 100% = (0.12 − 1) * 100% = −88%, to pass the class than someone who passed the midterm.
The relationships between probabilities, odds (ratios), and log-odds (ratios) are straightforward. An event with a small probability has small odds, and also has small log-odds. An event with a large probability has large odds and also large log-odds. Probabilities are always between zero and unity. Odds are bounded below by zero but can be arbitrarily large.
Unit Odds Ratios refers to the expected change in the odds ratio for a one-unit change in the independent variable. Range Odds Ratios refers to the expected change in the odds ratio when the independent variable changes from its minimum to its maximum.
Since the present independent variable is a binary 0/1 variable, these two definitions are the same.
For a one-unit increase in the midterm score, the new odds ratio will be 69.51% of the old odds ratio. Or, equivalently, you expect to see a 30.5% reduction in the odds ratio (0.695057 – 1) * (100%=-30.5%). For example, suppose a hypothetical student has a midterm score of 75%.
The fourth column (Most Likely PassClass) classifies the observation as either 1 or 0, depending on whether the probability is greater than or less than 50%. You can observe how well your model classifies all the observations (using this cut-off point of 50%) by producing a confusion matrix. Click the red triangle and click Confusion matrix.
Before you can begin constructing a model for customer churn, you need to understand model building for logistic regression.
The first thing to do is make sure that the data are loaded correctly.
Make sure that all binary variables are classified as Nominal.
Another very useful device
is the scatterplot/correlation matrix, which can, at a glance, suggest potentially useful independent variables that are correlated with the dependent variable.
the principle of parsimony—that is, a model that explains as much as possible of the variation in Y and uses as few significant independent variables as possible.
four approaches that you could take.
Inclusion of all the variables. In this approach, you just enter all the independent variables into the model. An obvious advantage of this approach is that it is fast and easy. However, depending on the data set, most likely several independent variables will be insignificantly related to the dependent variable.
Bivariate method. In this approach, you search for independent variables that might have predictive value for the dependent variable by running a series of bivariate logistic regressions. That is, you run a logistic regression for each of the independent variables, searching for “significant” relationships.
this approach is that it is the one most agreed upon by statisticians
Stepwise approach. In this approach, you would use the Fit Model platform, change the Personality to Stepwise and Direction to Mixed. The Mixed option is like Forward Stepwise, but variables can be dropped after they have been added. An advantage of this approach is that it is automated; so, it is fast and easy.
Decision trees. A decision tree is a data mining technique that can be used for variable selection
The advantage of using the decision tree technique is that it is automated, fast, and easy to run. Further, it is a popular variable reduction approach taken by many data mining analysts
No one approach is a clear winner. Nevertheless, it is recommended that you use the “Include all the variables” approach.
Three important ways to improve a model are as follows: ● If the logit appears to be nonlinear when plotted against some continuous variable, one resolution is to convert the continuous variable to a few dummy variables
● If a histogram shows that a continuous variable has an excess of observations at zero (which can lead to nonlinearity in the logit), add a dummy variable that equals one if the continuous variable is zero and equals zero otherwise.
● Finally, a seemingly numeric variable that is actually discrete can be broken up into a handful of dummy variables (for example, ZIP codes).
Before you can begin modeling, you must first e...
This highlight has been truncated due to consecutive passage length restrictions.
To summarize, you have dropped 7 of the original 23 variables from the data set (Phone, Day_Charge, Eve_Charge, Night_Charge, Intl_Charge, E_VMAIL_PLAN, and VMail_Plan). So there are now 16 variables left, one of which is the dependent variable, Churn. You have 15 possible independent variables to consider.
bivariate (two variables, one dependent and one independent) analyses,
logistic regressions (when the independent variable is continuous)
contingency tables (when the independent variable...
This highlight has been truncated due to consecutive passage length restrictions.
Make a list of all 15 variables that need to be tested, and write down the test result (for example, the relevant p-value) and your conclusion (for example, “include” or “exclude”). This not only prevents simple errors; it is a useful record of your work should you have
to come back to it later.
At the bottom of the table of results are the Likelihood Ratio and Pearson tests, both of which test the null hypothesis that State does not affect Churn, and both of which reject the null. The conclusion is that the variable State matters.
Under “Whole Model Test” that Prob>ChiSq, the p-value is less than 0.0001, so you conclude that VMail_message affects Churn.
linear regression minimizes the sum of squared residuals.
when you compare two linear regressions, the preferred one has the smaller sum of squared residuals.
So the model with the smaller −LogLikelihood is preferred to a model with a larger −LogLikelihood.
Examining the p-values of the independent variables in the Parameter Estimates, you find that a variable for which Prob>ChiSq is less than 0.05 is said to be significant.
“Effect Likelihood Ratio Tests.”
the Effect Likelihood Ratio Tests tells you: The effect of all the state dummy variables is significant with a Prob>ChiSq of 0.0010.
Histogram Options ▶ Show Counts.