Doug’s Kindle Notes & Highlights

Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications, by Chip Huyen

The samples selected by nonprobability criteria are not representative of the real-world data and therefore are riddled with selection biases.2 Because of these biases, you might think that it’s a bad idea to select data to train ML models using this family of sampling methods. You’re right. Unfortunately, in many cases, the selection of data for ML models is still driven by convenience.