Matt Mitchell’s Kindle Notes & Highlights for Fundamentals of Predictive Analytics with JMP

Bayesian information criterion (BIC)

in order to use a categorical

variable in a regression model, you must transform the categorical variables into continuous variables or integer (binary) variables. The resulting variables from this transformation are called indicator or dummy variables.

26%

in JMP this is not the case. If a categorical variable has two categories (or levels) such as gender, then a single dummy variable is used with values of +1 and -1 (with +1 assigned to the alphabetically first category). If a categorical variable has more than two categories (or levels), the dummy variables are assigned values +1, 0, and −1.

27%

Analysis of variance (more commonly called ANOVA),

27%

is a dependence multivariate technique. There are several variations of ANOVA, such as one-factor (or one-way) ANOVA, two-factor (or two-way) ANOVA, and so on, and also repeated measures ANOVA.

27%

The factors are the independent variables, each of which must be a categorical variable. The dependent variable is one continuous variable.

28%

If H0 is true, you would expect all the sample means to be close to each other and relatively close to the grand mean. If H1 is true, then at least one of the sample means would be significantly different.

28%

This within-sample variability is measured by the sum of squares within

28%

groups (or error) (SSE).

28%

(In JMP: TSS, SSBG, and SSE are identified as C.Total, Model SS, and Error SS, respectively.)

28%

If H0 of the F test is rejected, which implies that one or more of the population means are significantly different, you then proceed to the second part of an ANOVA study and identify which factor level means are significantly different.

28%

An additional plus of ANOVA is, if we are examining the relationship of two or more factors, ANOVA is good at uncovering any significant interactions or relationships among these factors.

28%

One-way ANOVA has one dependent variable and one X factor.

28%

The horizontal line across the entire plot represents the overall mean. Each factor level has its own mean diamond. The horizontal line in the center of the diamond is the mean for that level. The upper and lower vertices of the diamond represent the upper and lower 95% confidence limit on the mean, respectively.

28%

the horizontal width of the diamond is relative to that level’s (group’s) sample size. That is, the wider the diamond, the larger the sample size for that level relative to the other levels.

28%

The overall steps to evaluate an ANOVA model are as follows: 1. Conduct an F test. a. If you do not reject H0 (the p-value is not small), then stop because there is no difference in means. b. If you do reject H0, then go to Step 2. 2. Consider unequal variances; look at the Levine test. a. If you reject H0, then go to Step 3 because the variances are unequal. b. If you do not reject H0, then go to Step 4. 3. Conduct Welch’s test, which tests differences in means, assuming unequal variances. a. If you reject H0, because the means are significantly different, then go to Step 4. ...more

28%

ANOVA has three statistical assumptions to check and address. The first assumption is that the residuals should be independent.

28%

unless there is strong concern about the dependence of the residuals, this assumption does not have to be checked.

28%

The second statistical assumption is that the variances for each level are equal. Violation of this assumption is of more concern because it could lead to erroneous p-values and hence incorrect statistical conclusions.

28%

However, if, as it is in this

28%

case, there are only two groups tested, and then an F test for unequal variance is also performed.

28%

If you fail to reject H0 (that is, you have a large p-value), you have insufficient evidence to say tha...

This highlight has been truncated due to consecutive passage length restrictions.

28%

On the other hand, if you reject H0, the variances can be assumed to be unequal, and the ANOV...

This highlight has been truncated due to consecutive passage length restrictions.

28%

The third statistical assumption is that the residuals should be normally distributed.

29%

if slight departures from normality are detected, they will have no real effect on the F statistic. A normal quantile plot can confirm whether the residuals are normally distributed or not.

29%

If all the residuals fall on or near the straight line or within the confidence bounds, the residuals should be considered normally distributed.

29%

The p-value for the F test is <.0001, so one or more of the Process means differ from each other.

29%

In general, the Levene test is more widely used and more comprehensive, so you focus

29%

only on the Levene test.

29%

Because the p-value for the Welch’s Test is small, you can reject the null hypothesis; the pairs of means are different from one another.

29%

If you had not rejected the Welch’s test, then, it would not be recommended that you perform these second-stage tests.

29%

Since the p-value for the Welch’s Test is small, you can reject the null hypothesis; the means are significantly different from one another.

29%

the likelihood of a Type I error increases with the number of pairwise comparisons.

29%

unless the number of pairwise comparisons is small, this test is not recommended.

29%

If the main objective is to check for any possible pairwise difference in the mean values, and there are several factor levels, the Tukey HSD (honest significant difference) also called Tukey-Kramer HSD test is the most desired test.

29%

To identify mean differences, examine the Connecting Letters Report. Groups that do not share the same letter are significantly different from one another.

29%

The Hsu’s MCB (multiple comparison with best) is used to determine whether each factor level mean can be rejected as the “best” of all the other means, where “best” means either a maximum or minimum value.

29%

The p-value report and the maximum and minimum LSD (Least Squares Differences) matrices can be used to identify significant differences. The p-value report identifies whether a factor level mean is significantly different from the maximum and from the minimum of all the other means.

30%

Differences with the Hsu’s MCB test are less conservative than those found with the Tukey-Kramer test.

30%

Hsu’s MCB test should be used if there is a need to make specific inferences about the maximum or minimum values.

30%

the With Control, Dunnett’s, is applicable when you do not wish to make all pairwise comparisons, but rather only to compare one of the levels (the “control”) with each other level.

30%

Two-way ANOVA is an extension of the one-way ANOVA in which there is one continuous dependent variable, but, now you have two categorical independent variables.

30%

There are three basic two-way ANOVA designs: without replication, with equal replication, and with unequal replication.

30%

Only the two-way ANOVA with equal replication is discussed here.

30%

If there were no significant interaction, the lines in the LSMeans Plot would not cross and would be mostly parallel.

31%

linear regression cannot be used for a binary dependent

31%

variable.

31%

Consequently, statisticians have developed a specialized form of regression call logistic regressio...

This highlight has been truncated due to consecutive passage length restrictions.

31%

Logistic regression,