Yerzhan’s Kindle Notes & Highlights for Numsense! Data Science for the Layman: No Math Added

Rate it:

Open Preview

More on this book

Community

Sakib

1 note & 231 highlights

Shailesh Dudala

3 highlights

German

30 highlights

Kindle Notes & Highlights

by Yerzhan

See all Yerzhan’s Notes & Highlights

Numsense! Data Science for the Layman: No Math Added

by Annalyn Ng

17%

One way to keep a model’s overall complexity in check is to introduce a penalty parameter, in a step known as regularization. This new parameter penalizes any increase in a model’s complexity by artificially inflating prediction error, thus enabling the algorithm to account for both complexity and accuracy in optimizing its original parameters. By keeping a model simple, we help to maintain its generalizability.

17%

Percentage of Correct Predictions. The most simplistic definition of prediction accuracy is the proportion of predictions that proved to be correct.

20%

There are four key steps in a data science study: Prepare data Select algorithms to model the data Tune algorithms to optimize the models Evaluate models based on their accuracy

21%

Personality traits are also a good way to group customers—as done in the following survey of Facebook users.

22%

The number of clusters should be large enough to enable us to extract meaningful patterns that can inform business decisions, but also small enough to ensure that clusters remain clearly distinct.

23%

A scree plot shows how within-cluster scatter decreases as the number of clusters increases.

26%

Principal Component Analysis (PCA) is a technique that finds the underlying variables (known as principal components) that best differentiate your data points.

28%

Therefore, instead of analyzing the four nutrition variables separately, we can combine highly-correlated variables, hence leaving just two dimensions to consider. This is why PCA is called a dimension reduction technique.

30%

A rule of thumb is to use the number of principal components corresponding to the location of a kink, which is a sharp bend in the scree plot.

33%

A support threshold could be chosen to identify frequent itemsets, such that itemsets with support values above this threshold would be deemed as frequent.

38%

To examine how these people relate to each other, such as identifying prominent individuals and how they drive group dynamics, we could use a technique called Social Network Analysis (SNA). SNA has potential applications in viral marketing, epidemic modeling, and even team game strategies. However, it is most known for its use in analyzing relationships in social networks, which gave it its name.

60%

SVM first projects the data onto a higher dimension for which data points can be separated with a straight line (see Figure 4). These straight lines are simpler to compute, and are also easily translated into curved lines when projected back down onto a lower dimension.

See a Problem?

Preview — Numsense! Data Science for the Layman by Annalyn Ng