Companies are spending billions on machine learning projects, but it's money wasted if the models can't be deployed effectively. In this practical guide, Hannes Hapke and Catherine Nelson walk you through the steps of automating a machine learning pipeline using the TensorFlow ecosystem. You'll learn the techniques and tools that will cut deployment time from days to minutes, so that you can focus on developing new models rather than maintaining legacy systems.
Data scientists, machine learning engineers, and DevOps engineers will discover how to go beyond model development to successfully productize their data science projects, while managers will better understand the role they play in helping to accelerate these projects.
pretty disappointing, not much value for me personally, most of the features in such a pipeline we already automate on our own, we have even more specific and complex use cases. what's worse, the TFX changes so much since the book's publication. Further more, lack of documentation, lack of working codes. The only i got away is some inspiration on the data shift and data validation as well as versioning concept.
Some concepts I learnt from this book: DATA VERSIONING: DVC and Pachyderm
DATA PREPROCESSING: should save the graph of preprocessing steps for future serving (avoid have 2 different processes which try to match with each other)
DISTRIBUTED TRAINING: synchronous and asynchronous training. Typically, synchronous strategies are coordinated via all-reduce operations and asynchronous strategies through a parameter server architecture.
MODEL SERVING Traditional API Server is not good - Lack of Code Separation: Separate API code and machine learning model - Lack of Model Version Control - Inefficient Model Inference: Batch Inference -> Better performance on GPU
MODEL OPTIMIZATION: Quantization, Prunning, and Model Distillation.
I think it easy to get the overall concept. To learn more detail you need to get hands on. There are some tips to use tfx which is not specified in its documentation.