Fundamentals of Predictive Analytics with JMP
Rate it:
Read between January 2 - March 20, 2018
8%
Flag icon
We believe there are six fundamental concepts: ●   FC1: Always take a random and representative sample. ●   FC2: Statistics is not an exact science. ●   FC3: Understand a z-score. ●   FC4: Understand the central limit theorem (not every distribution has to be bell-shaped). ●   FC5: Understand one-sample hypothesis testing and p-values. ●   FC6: Few approaches are correct and many wrong.
8%
Flag icon
What is a random and representative sample (called a 2R sample)?
8%
Flag icon
representative means representative of the population of interest.
8%
Flag icon
the population of interest is those individuals who are registered to vote and plan to vote.
8%
Flag icon
Random, means that each individual has an equal chance of being selected.
8%
Flag icon
First, if the sample is a 2R sample, then the sample distribution of observations will follow a pattern resembling that of the population.
8%
Flag icon
The population parameters (such as the population mean, µ, the population variance, σ2, or the population standard deviation, σ) are the true values of the population. These are the values that you are interested in knowing.
8%
Flag icon
Because the sample is a 2R sample, the sample distribution of observations is very similar to the population distribution of observations. Therefore, the sample statistics, calculated from the sample, are good estimates of their corresponding population parameters.
8%
Flag icon
The sample statistics (such as the sample mean, sample variance, and sample standard deviation) are estimates of their corresponding population parameters. It is highly unlikely that they will equal their corresponding population parameter.
8%
Flag icon
By using statistical techniques, you can test the likelihood of the population parameter being greater than 50%. (You can construct a confidence interval, and if the lower confidence level is greater than 50%, you can be highly confident that the true population proportion is greater than 50%. Or you can conduct a hypothesis test to measure the likelihood that the proportion is greater than 50%.)
8%
Flag icon
you must realize that these sample statistics are estimates, in that, if other 2R samples are taken, they will produce different estimates.
9%
Flag icon
The z-score (and the t-score) is not just a number. The z-score is how many standard deviations away that a value, like the 570, is from the mean of 500. The z-score can provide you some guidance, regardless of the shape of the distribution. A z-score greater than (absolute value) 3 is considered an outlier and highly unlikely.
9%
Flag icon
It depends on the spread of the data, which is measured by the standard deviation.
9%
Flag icon
In general, the z-score is like a traffic light. If it is greater than the absolute value of 3 (denoted |3|), the light is red; this is an extreme value. If the z-score is between |1.65| and |3|, the light is yellow; this value is borderline. If the z-score is less than |1.65|, the light is green, and the value
9%
Flag icon
is just considered random variation. (The cutpoints of 3 and 1.65 might vary slightly de...
This highlight has been truncated due to consecutive passage length restrictions.
9%
Flag icon
in the real world, you only take one 2R sample.
9%
Flag icon
central limit theorem (CLT)
9%
Flag icon
The CLT will hold regardless of the shape of the population distribution of observations—whether it is normal, bimodal (like the sumo wrestlers and jockeys), or whatever shape, as long as a 2R sample is taken and the sample size is greater than 30.
9%
Flag icon
Then, the sampling distribution of sample means will be approximately normal, with a mean of and a standard deviation of (s / √n) ...
This highlight has been truncated due to consecutive passage length restrictions.
9%
Flag icon
You need to take only one 2R sample with a sample size greater than 30.
9%
Flag icon
If you have a 2R sample greater than 30, you can approximate the sampling distribution of sample means by using the sample’s and standard error, s / √n. If you collect a 2R sample greater than 30, the CLT holds. As a result, you can use inferential statistics. That is, you can construct confidence intervals and perform hypothesis tests.
9%
Flag icon
the CLT theorem is known as the “cornerstone of statistics.”
10%
Flag icon
You have now generated a random sample of 30. If you press F9, the random sample will change.
Matt Mitchell
For Mac, press FN+F9 to do this.
10%
Flag icon
Again, as you press the F9 key, the random sample and corresponding frequency distribution changes. (Hence, it is called a dynamic frequency distribution.)
10%
Flag icon
One of the inferential statistical techniques that you can apply, thanks to the CLT, is one-sample hypothesis testing of the mean.
10%
Flag icon
hypothesis testing consists of two hypotheses, the null hypothesis, called H0, and the opposite to H0—the alternative hypothesis, called H1 or Ha. The null hypothesis for one-sample hypothesis testing of the mean tests whether the population mean is equal to, less than or equal to, or greater than or equal to a particular constant, µ = k, µ ≤ k, or µ ≥ k.
10%
Flag icon
Once the hypotheses are identified, the statistical test statistic is calculated.
10%
Flag icon
calculated statistical test statistic is called Zcalc. This Zcalc is compared to what here will be called the critical z, Zcritical. The Zcritical value is based on what is called a level of significance, called α, which is usually equal to 0.10, 0.05, or 0.01.
10%
Flag icon
The level of significance can be viewed as the probability of making an error (or mistake), given that the H0 is correct.
10%
Flag icon
you want to keep the level of significance rather small.
10%
Flag icon
you want to keep the likelihood of making an error relatively small.
10%
Flag icon
If |Zcalc| > |Zcritical|, you reject H0. When you reject H0, there is enough statistical evidence to support H1.
10%
Flag icon
On the other hand, you do fail to reject H0 when |Zcalc| ≤ |Zcritical|, and you conclude that there is not enough statistical evidence to support H1.
10%
Flag icon
As discussed under FC3, “Understand a Z-Score,” the |Zcalc| is not simply a number. It represents the number of standard deviations away from the mean that a value
10%
Flag icon
is.
10%
Flag icon
you reject H0 when the value is a relatively large number of standard deviations away from the hypothesized value.
10%
Flag icon
when you have a relatively small |Zcalc| (that is, |Zcalc| ≤ |Zcritical|), you fail to reject H0. That is, the |Zcalc| value is relatively near the hypothesized value and could be simply due to random variation.
10%
Flag icon
The p-value is the probability of rejecting H0. Thus, in terms of the one-sample hypothesis test using the Z, the p-value is the probability that is associated with Zcalc.
10%
Flag icon
The p-values and |Zcalc| have an inverse relationship: Relatively large |Zcalc| values are associated with relatively small p-values, and, vice versa, relatively small |Zcalc| values have relatively large p-values.
11%
Flag icon
General interpretation of a p-value is as follows: ●   Less than 1%: There is overwhelming evidence that supports the alternative hypothesis. ●   Between 1% and 5%. There is strong evidence that supports the alternative hypothesis. ●   Between 5% and 10%. There is weak evidence that supports the alternative hypothesis. ●   Greater than 10%: There is little to no evidence that supports the alternative hypothesis.
11%
Flag icon
Two major questions should be asked when considering the use of a statistical approach or technique: ●   Is it statistically appropriate?
11%
Flag icon
What will it possibly tell you?
11%
Flag icon
with categorical data, you cannot measure distance.
11%
Flag icon
Simply in terms of graphing, you would use bar and pie charts for categorical data but not for continuous data. On the other hand, graphing a continuous variable requires a histogram or box plot.
11%
Flag icon
When summarizing data, descriptive statistics are insightful for continuous variables. A frequency distribution is much...
This highlight has been truncated due to consecutive passage length restrictions.
11%
Flag icon
Countif.xls in worksheet rawdata.
11%
Flag icon
Major and gender (and correspondingly gender code) are examples of nominal data. The Likert scale of usefulness is an example of ordinal data. Salary, GPA, and years are examples of continuous data.
11%
Flag icon
descriptive statistics are valuable in understanding the continuous data—An example would be the fact that since the average is somewhat less than the median the salary data could be considered to be slightly left-skewed and with a minimum of $31,235 and a maximum of $65,437.
11%
Flag icon
All the categorical variables (Major, Gender, Usefulness, and Gender code), whether they are nominal or ordinal, have frequency numbers and a histogram, and no descriptive statistics. But the continuous variables have descriptive statistics and a histogram.
12%
Flag icon
Most of the time in JMP, if you are looking for some more information to display or statistical options, they can be usually found by clicking the red triangle.
« Prev 1 3 4 5