ElvinOuyang’s Kindle Notes & Highlights

Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking, by Foster Provost

The most commonly used penalty is the sum of the squares of the weights, sometimes called the “L2-norm” of w. The reason is technical, but basically functions can fit data better if they are allowed to have very large positive and negative weights. The sum of the squares of the weights gives a large penalty when weights have large absolute values.