Fundamentals of Predictive Analytics with JMP
Rate it:
Read between January 2 - March 20, 2018
31%
Flag icon
is one of the dependence techniques in which the dependent variable is discrete and, more specifically, binary. That is, it takes on only two possible values.
31%
Flag icon
Combining these two results, you have P[Yi = 1] = a + b * Xi. You can see that, in the case of a binary dependent variable, the regression might be interpreted as a probability.
31%
Flag icon
If the estimated probability is high enough (for example, above .5), then you predict 1. Conversely, if the estimated probability of a 1 is low enough (for example, below .5), then you predict 0.
31%
Flag icon
When linear regression is applied to a binary dependent variable, it is commonly called the linear probability model (LPM).
31%
Flag icon
Three primary difficulties arise in the LPM. First, the predictions from a linear regression do not necessarily fall between zero and one.
31%
Flag icon
Second, for any given predicted value of y (denoted ŷ), the residual (resid = y − ŷ) can take only two values.
31%
Flag icon
A further implication of the fact that the residual can take on only two values for any ŷ is that the residuals are heteroscedastic. This violates the linear regression assumption of homoscedasticity (constant variance). The estimates of the standard errors of the regression coefficients will not be stable, and inference will be unreliable.
32%
Flag icon
Third, the linearity assumption is likely to be invalid, especially at the extremes of the independent variable.
32%
Flag icon
Another useful representation of the logistic function is the following: Recognize that the y-axis, G(z), is a probability, and let G(z) = π, the probability of the event’s occurring.
32%
Flag icon
Consider taking the natural logarithm of both sides. The left side will become log[π/(1 − π)]. The log of the odds ratio is called the logit.
33%
Flag icon
When a logistic regression has been run, simply clicking the red triangle and selecting Odds Ratios will do the trick.
33%
Flag icon
Compared to a person who passes the midterm, a person who fails the midterm is 12% as likely to pass the class. Or equivalently, a person who fails the midterm is 88% less likely, (OR − 1) * 100% = (0.12 − 1) * 100% = −88%, to pass the class than someone who passed the midterm.
33%
Flag icon
The relationships between probabilities, odds (ratios), and log-odds (ratios) are straightforward. An event with a small probability has small odds, and also has small log-odds. An event with a large probability has large odds and also large log-odds. Probabilities are always between zero and unity. Odds are bounded below by zero but can be arbitrarily large.
33%
Flag icon
Unit Odds Ratios refers to the expected change in the odds ratio for a one-unit change in the independent variable. Range Odds Ratios refers to the expected change in the odds ratio when the independent variable changes from its minimum to its maximum.
33%
Flag icon
Since the present independent variable is a binary 0/1 variable, these two definitions are the same.
34%
Flag icon
For a one-unit increase in the midterm score, the new odds ratio will be 69.51% of the old odds ratio. Or, equivalently, you expect to see a 30.5% reduction in the odds ratio (0.695057 – 1) * (100%=-30.5%). For example, suppose a hypothetical student has a midterm score of 75%.
34%
Flag icon
The fourth column (Most Likely PassClass) classifies the observation as either 1 or 0, depending on whether the probability is greater than or less than 50%. You can observe how well your model classifies all the observations (using this cut-off point of 50%) by producing a confusion matrix. Click the red triangle and click Confusion matrix.
35%
Flag icon
Before you can begin constructing a model for customer churn, you need to understand model building for logistic regression.
35%
Flag icon
The first thing to do is make sure that the data are loaded correctly.
35%
Flag icon
Make sure that all binary variables are classified as Nominal.
35%
Flag icon
Another very useful device
35%
Flag icon
is the scatterplot/correlation matrix, which can, at a glance, suggest potentially useful independent variables that are correlated with the dependent variable.
35%
Flag icon
the principle of parsimony—that is, a model that explains as much as possible of the variation in Y and uses as few significant independent variables as possible.
35%
Flag icon
four approaches that you could take.
35%
Flag icon
Inclusion of all the variables. In this approach, you just enter all the independent variables into the model. An obvious advantage of this approach is that it is fast and easy. However, depending on the data set, most likely several independent variables will be insignificantly related to the dependent variable.
35%
Flag icon
Bivariate method. In this approach, you search for independent variables that might have predictive value for the dependent variable by running a series of bivariate logistic regressions. That is, you run a logistic regression for each of the independent variables, searching for “significant” relationships.
35%
Flag icon
this approach is that it is the one most agreed upon by statisticians
35%
Flag icon
Stepwise approach. In this approach, you would use the Fit Model platform, change the Personality to Stepwise and Direction to Mixed. The Mixed option is like Forward Stepwise, but variables can be dropped after they have been added. An advantage of this approach is that it is automated; so, it is fast and easy.
35%
Flag icon
Decision trees. A decision tree is a data mining technique that can be used for variable selection
35%
Flag icon
The advantage of using the decision tree technique is that it is automated, fast, and easy to run. Further, it is a popular variable reduction approach taken by many data mining analysts
35%
Flag icon
No one approach is a clear winner. Nevertheless, it is recommended that you use the “Include all the variables” approach.
35%
Flag icon
Three important ways to improve a model are as follows: ●   If the logit appears to be nonlinear when plotted against some continuous variable, one resolution is to convert the continuous variable to a few dummy variables
35%
Flag icon
●   If a histogram shows that a continuous variable has an excess of observations at zero (which can lead to nonlinearity in the logit), add a dummy variable that equals one if the continuous variable is zero and equals zero otherwise.
35%
Flag icon
●   Finally, a seemingly numeric variable that is actually discrete can be broken up into a handful of dummy variables (for example, ZIP codes).
35%
Flag icon
Before you can begin modeling, you must first e...
This highlight has been truncated due to consecutive passage length restrictions.
36%
Flag icon
To summarize, you have dropped 7 of the original 23 variables from the data set (Phone, Day_Charge, Eve_Charge, Night_Charge, Intl_Charge, E_VMAIL_PLAN, and VMail_Plan). So there are now 16 variables left, one of which is the dependent variable, Churn. You have 15 possible independent variables to consider.
36%
Flag icon
bivariate (two variables, one dependent and one independent) analyses,
36%
Flag icon
logistic regressions (when the independent variable is continuous)
36%
Flag icon
contingency tables (when the independent variable...
This highlight has been truncated due to consecutive passage length restrictions.
36%
Flag icon
Make a list of all 15 variables that need to be tested, and write down the test result (for example, the relevant p-value) and your conclusion (for example, “include” or “exclude”). This not only prevents simple errors; it is a useful record of your work should you have
36%
Flag icon
to come back to it later.
36%
Flag icon
At the bottom of the table of results are the Likelihood Ratio and Pearson tests, both of which test the null hypothesis that State does not affect Churn, and both of which reject the null. The conclusion is that the variable State matters.
36%
Flag icon
Under “Whole Model Test” that Prob>ChiSq, the p-value is less than 0.0001, so you conclude that VMail_message affects Churn.
36%
Flag icon
linear regression minimizes the sum of squared residuals.
36%
Flag icon
when you compare two linear regressions, the preferred one has the smaller sum of squared residuals.
36%
Flag icon
So the model with the smaller −LogLikelihood is preferred to a model with a larger −LogLikelihood.
36%
Flag icon
Examining the p-values of the independent variables in the Parameter Estimates, you find that a variable for which Prob>ChiSq is less than 0.05 is said to be significant.
37%
Flag icon
“Effect Likelihood Ratio Tests.”
37%
Flag icon
the Effect Likelihood Ratio Tests tells you: The effect of all the state dummy variables is significant with a Prob>ChiSq of 0.0010.
37%
Flag icon
Histogram Options ▶ Show Counts.