Kindle Notes & Highlights
by
Ron Klimberg
Read between
January 2 - March 20, 2018
introduce interactions terms–that is, the product of two or more variables. Best practice would be to consult with subject-matter experts and seek their advice. Some thought is necessary to determine meaningful interactions,
Observe that almost all customers make international calls, but most of them are not on the international plan (which gives cheaper rates for international calls).
The parameter estimate for Intl_Plan[no] is positive and significant. This means that when a customer does not have an international plan, the probability is that the churn increases.
Principal component analysis (PCA) is an exploratory multivariate technique with two overall objectives. One objective is “dimension reduction”— to turn a collection of, for example, 100 variables into a collection of 10 variables that retain almost all the information that was contained in the original 100 variables. The other objective is to discover the structure in the relationships between the variables.
PCA assesses the structure of the interrelationships (correlations) among the variables by defining a set of common underlying dimensions called components or factors.
The first eigenvalue, 1.5752, is much larger than the second eigenvalue, 0.4248. This suggests that the first principal component, Prin1, is much more important (in terms of explaining the variation in the pair of variables) than the second principal component, Prin2.
The second graph is a scatter plot of the two principal components; this is called a score plot.
A correlation between the principal components would be indicated by a scatter with a positive or negative slope.
The third graph is called a loadings plot, and it shows the contribution made to each principal component (the principal components are the axes of the graph) by the original variables (which are the directed points on the graph).
When there are many variables, it is possible to see which variables group together in the principal component space.
The method of principal components, or PCA, works by transforming a set of k correlated variables into a set of k uncorrelated variables that are called, not coincidentally, principal components or simply components.
(The Bartlett Test assesses whether the variances of the eigenvalues are equal or not. If you fail to reject, PCA is inappropriate).
One of the main objectives of PCA is to reduce the information contained in all the original variables into a smaller set of components with a minimum loss of information.
Account for a specified proportion of the variation. This can be used in two ways. First, the researcher can desire to account for at least, for example, 70% or 80% of the variation, and retain enough principal components to achieve this goal.
PCA can also be used to gain insight into the structure of the data set in two ways. First, the factor loadings can be used to plot the variables in the principal components space (this is the “Loading Plot”). And it is sometimes possible to see which variables are “close” to each other in the principal components space. Second, the principal component scores can be plotted for each observation (this is the “Score Plot”), and aberrant observations or small, unusual clusters might be noted.
When there are only two or three important principal components, analyzing the loading plots involves looking at only 1 or perhaps 3 plots.
it is important to keep in mind that the primary use of PCA in multivariate analysis and data mining is data reduction—that is, reducing several variables to a few.