Fundamentals of Predictive Analytics with JMP
Rate it:
Read between January 2 - March 20, 2018
12%
Flag icon
Click the red triangle next to Summary Statistics (note that summary statistics are listed for continuous variables only), and click Customize Summary Statistics. Click the check box, or check boxes, on the summary statistics that you want displayed, such as Median, Minimum or Maximum; and then click OK.
12%
Flag icon
A positive value implies that, as Years increases, Salary also increases, or the slope is positive. In contrast, a negative relationship has a negative slope. So, as the X variable increases, the Y variable decreases.
12%
Flag icon
RSquare values can range from 0 (no relationship) to 1 (exact/perfect relationship).
12%
Flag icon
(A negative correlation implies a negative linear relationship, and a positive correlation implies a positive linear relationship.)
12%
Flag icon
With a p-value of 0.1993, you would fail to reject H0 and conclude that there is not a significant relationship between Major and Gender.
12%
Flag icon
JMP, in the bivariate analysis diagram of the Fit Y by X dialog box, helps the analyst select the proper statistical method to use. The Y variable is usually considered to be a dependent variable.
12%
Flag icon
Depending on the type of data, some techniques are appropriate and some are not.
12%
Flag icon
just because an approach/technique appears appropriate, before running it, you need to step back and ask yourself what the results could provide. Part of that answer requires understanding and having knowledge of the actual problem situation being solved or examined.
12%
Flag icon
When using a certain technique, three possible outcomes could occur:
12%
Flag icon
●   The technique is not appropriate to use with the data and should not be used. ●   The technique is appropriate to use with the data. However, the results are not meaningful. ●   The technique is appropriate to use with the data and, the results are meaningful.
23%
Flag icon
Regression analysis typically has one or two main purposes. First, it might be used to understand the cause-and-effect relationship between one dependent variable and one or more independent variables: For example, it might answer the question, how does the amount of advertising affect sales? Or, secondly, regression might be applied for prediction—in particular, for the purpose of forecasting a dependent variable based on one or more independent variables.
23%
Flag icon
Regression analyses can handle linear or nonlinear relationships, although the linear models are mostly emphasized.
24%
Flag icon
The Fit Line causes a simple linear regression to be performed and the creation of the tables that contain the regression results.
24%
Flag icon
The F test and t test, in simple linear regression, are equivalent because they both test whether the independent variable (Adver) is significantly related to the dependent variable (Sales).
24%
Flag icon
The oval shape in each scatterplot is the corresponding bivariate normal density ellipse of the two variables. If the two variables are bivariate normally distributed, then about 95% of the points would be within the ellipse. If the ellipse is rather wide (round) and does not follow either of the diagonals, then the two variables do not have a strong correlation.
24%
Flag icon
The more significant the correlation, the narrower the ellipse, and the more it follows along one of the diagonals
24%
Flag icon
The Effect Summary report,
24%
Flag icon
lists in ascending p-value order the LogWorth or False Discovery Rate (FDR) LogWorth values. These statistical values measure the effects of the independent variables in the model.
24%
Flag icon
A LogWorth value greater than 2 corresponds to a p-value of less than .01, The FDR LogWorth is a better statistic for assessing significance since it adjusts the p-values to account...
This highlight has been truncated due to consecutive passage length restrictions.
24%
Flag icon
listed under the Parameter estimates, the multiple linear regression equation is as follows: Sales = −1485.88 + 1.97 * Time + 0.04 * MktPoten + 0.15 * Adver + 198.31* MktShare + 295.87 * Change + 5.61 * Accts + 19.90 * WkLoad
24%
Flag icon
Each independent variable regression coefficient represents an estimate of the change in the dependent variable to a unit increase in that independent variable while all the other independent variables are held constant.
24%
Flag icon
The larger the absolute value of the standardized beta coefficient, the more important the variable.
25%
Flag icon
The Prediction Profiler displays a graph for each independent variable X against the dependent Y variable
25%
Flag icon
These transformed residuals are called X leverage residuals and Y leverage residuals. The black line represents the predicted values for individual X values, and the blue dotted line is the corresponding 95% confidence interval. If the confidence region between the upper and lower confidence interval crosses the horizontal line, then the effect of X is significant.
25%
Flag icon
The process for evaluating the statistical significance of a regression model is as follows: 1.   Determine whether it is good or bad:
25%
Flag icon
a.   Conduct an F test. b.   Conduct a t test for each independent variable. c.   Examine the residual plot (if time series data, conduct a Durbin-Watson test). d.   Assess the degree of multicollinearity (variance inflation factor, VIF). 2.   Determine the goodness of fit: a.   Compute Adjusted R2. b.   Compute RMSE (or Se)
25%
Flag icon
The F test, in multiple regression, is known as an overall test.
25%
Flag icon
The hypotheses for the F test are as follows, where k is the number of independent variables: H0 : β1 = β2 = … = βk = 0 H1 : not all equal to 0
25%
Flag icon
If you fail to reject the F test, then the overall model is not statistically significant. The model shows no l...
This highlight has been truncated due to consecutive passage length restrictions.
25%
Flag icon
On the other hand, if you reject H0, then you can conclude that one or more of the independent variables are linearly related to the dependent variabl...
This highlight has been truncated due to consecutive passage length restrictions.
25%
Flag icon
you reject the F test and can conclude that one or more of the independent variables
25%
Flag icon
is significantly related to
25%
Flag icon
You are not testing whether independent variable k, xk, is significantly related to the dependent variable. What you are testing is whether xk is significantly related to the dependent variable above and beyond all the other independent variables that are currently in the model.
25%
Flag icon
If βk cannot be determined to be significantly different from 0, then xk has no effect on Y. Again, in this situation, you want to reject H0.
25%
Flag icon
Examine the plot for any patterns—oscillating too often or increasing or decreasing in values or for outliers. The data appears to be random.
25%
Flag icon
If the observations were taken in some time sequence, called a time series (not applicable to the sales performance data set), the Durbin-Watson test should be performed.
25%
Flag icon
In general, you want high p-values. High p-values of the Durbin-Watson test indicate that there is no problem with first order autocorrelation.
25%
Flag icon
Multicollinearity occurs when the two or more independent variables explain the same variability of Y. Multicollinearity does not violate any of the statistical assumptions of regression.
25%
Flag icon
significant multicollinearity is likely to make it difficult to interpret the meaning of the regression coefficients of the independent variables.
25%
Flag icon
variance inflation factor (VIF).
25%
Flag icon
By definition, it must be greater than or equal to 1.
25%
Flag icon
the basic guidelines (Marquardt 1980; Snee 1973) for identifying whether significant multicollinearity exists are as follows: ●   1 ≤ VIFk ≤ 5 means no significant multicollinearity. ●   5 < VIFk ≤ 10 means that you should be concerned that some multicollinearity might exist. ●   VIFk > 10 means significant multicollinearity.
26%
Flag icon
One perspective is to include those nonsignificant/high VIFk variables because they explain some, although not much, of the variation in the dependent variable.
26%
Flag icon
The other point of view follows the principle of parsimony, which states that the smaller the number of variables in the model, the better.
26%
Flag icon
There are numerous approaches and several statistical variable selection techniques to achieve this goal of only significant independent variables in the model. Stepwise regression is one of the simplest approaches.
26%
Flag icon
The goodness of fit of the regression model is measured by the Adjusted R2 and the se or RMSE,
26%
Flag icon
The Adjusted R2 measures the percentage of the variability in the dependent variable that is explained by the set of independent variables and is adjusted for the number of independent variables (Xs) in the model. If the purpose of performing the regression is to understand the relationships of the independent variables with the dependent variable, the Adjusted R2 value is a major assessment of the goodness of fit.
26%
Flag icon
The higher the Adjusted R2, the better the fit.
26%
Flag icon
On the other hand, if the regression model is for prediction/forecasting, the value of the se is of more
26%
Flag icon
concern. A smaller se generally means a smaller f...
This highlight has been truncated due to consecutive passage length restrictions.