More on this book
Kindle Notes & Highlights
Started reading
July 6, 2017
Furthermore, standardization maintains useful information about outliers and makes the algorithm less sensitive to them in contrast to min-max scaling, which scales the data to a limited range of values.
In contrast to L2 regularization, L1 regularization yields sparse feature vectors; most feature weights will be zero.
Using a random forest, we can measure feature importance as the averaged impurity decrease computed from all decision trees in the forest without making any assumptions whether our data is linearly separable or not.
Rather, the choice of an appropriate distance metric and the use of domain knowledge that can help guide the experimental setup can be even more important.

