Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications
Rate it:
Open Preview
11%
Flag icon
Progress in the last decade shows that the success of an ML system depends largely on the data it was trained on. Instead of focusing on improving ML algorithms, most companies focus on managing and improving their data.
Dima Timofeev
Garbage in and garbage out. If an ML engineer is not obsessed about data quality (e.g. feature/label correctness, dataset distributional properties, etc), it's a clear indication that they don't understand how ML actually works.
18%
Flag icon
A repository for storing structured data is called a data warehouse. A repository for storing unstructured data is called a data lake.
23%
Flag icon
Samples with higher weights affect the loss function more. Changing sample weights can change your model’s decision boundaries significantly,
29%
Flag icon
According to Krizhevsky et al. in their legendary AlexNet paper, “The transformed images are generated in Python code on the CPU while the GPU is training on the previous batch of images. So these data augmentation schemes are, in effect, computationally free.”48
84%
Flag icon
Frederick P. Brooks, “What one programmer can do in one month, two programmers can do in two months.”