Dima Timofeev’s Kindle Notes & Highlights for Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications

Rate it:

Open Preview

More on this book

Community

Ilangovan

3 highlights

Doug Lautzenheiser

106 highlights

Kindle Notes & Highlights

by Dima Timofeev

See all Dima’s Notes & Highlights

Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications

by Chip Huyen

Read between January 5 - January 20, 2025

11%

Progress in the last decade shows that the success of an ML system depends largely on the data it was trained on. Instead of focusing on improving ML algorithms, most companies focus on managing and improving their data.

Garbage in and garbage out. If an ML engineer is not obsessed about data quality (e.g. feature/label correctness, dataset distributional properties, etc), it's a clear indication that they don't understand how ML actually works.

18%

A repository for storing structured data is called a data warehouse. A repository for storing unstructured data is called a data lake.

23%

Samples with higher weights affect the loss function more. Changing sample weights can change your model’s decision boundaries significantly,

29%

According to Krizhevsky et al. in their legendary AlexNet paper, “The transformed images are generated in Python code on the CPU while the GPU is training on the previous batch of images. So these data augmentation schemes are, in effect, computationally free.”48

84%

Frederick P. Brooks, “What one programmer can do in one month, two programmers can do in two months.”

See a Problem?

Preview — Designing Machine Learning Systems by Chip Huyen