Numsense! Data Science for the Layman: No Math Added
Rate it:
Open Preview
17%
Flag icon
One way to keep a model’s overall complexity in check is to introduce a penalty parameter, in a step known as regularization. This new parameter penalizes any increase in a model’s complexity by artificially inflating prediction error, thus enabling the algorithm to account for both complexity and accuracy in optimizing its original parameters. By keeping a model simple, we help to maintain its generalizability.
17%
Flag icon
Percentage of Correct Predictions. The most simplistic definition of prediction accuracy is the proportion of predictions that proved to be correct.
20%
Flag icon
There are four key steps in a data science study: Prepare data Select algorithms to model the data Tune algorithms to optimize the models Evaluate models based on their accuracy
21%
Flag icon
Personality traits are also a good way to group customers—as done in the following survey of Facebook users.
22%
Flag icon
The number of clusters should be large enough to enable us to extract meaningful patterns that can inform business decisions, but also small enough to ensure that clusters remain clearly distinct.
23%
Flag icon
A scree plot shows how within-cluster scatter decreases as the number of clusters increases.
26%
Flag icon
Principal Component Analysis (PCA) is a technique that finds the underlying variables (known as principal components) that best differentiate your data points.
28%
Flag icon
Therefore, instead of analyzing the four nutrition variables separately, we can combine highly-correlated variables, hence leaving just two dimensions to consider. This is why PCA is called a dimension reduction technique.
30%
Flag icon
A rule of thumb is to use the number of principal components corresponding to the location of a kink, which is a sharp bend in the scree plot.
33%
Flag icon
A support threshold could be chosen to identify frequent itemsets, such that itemsets with support values above this threshold would be deemed as frequent.
38%
Flag icon
To examine how these people relate to each other, such as identifying prominent individuals and how they drive group dynamics, we could use a technique called Social Network Analysis (SNA). SNA has potential applications in viral marketing, epidemic modeling, and even team game strategies. However, it is most known for its use in analyzing relationships in social networks, which gave it its name.
60%
Flag icon
SVM first projects the data onto a higher dimension for which data points can be separated with a straight line (see Figure 4). These straight lines are simpler to compute, and are also easily translated into curved lines when projected back down onto a lower dimension.