Data science today is a lot like the Wild West: there’s endless opportunity and excitement, but also a lot of chaos and confusion. If you’re new to data science and applied machine learning, evaluating a machine-learning model can seem pretty overwhelming. Now you have help. With this O’Reilly report, machine-learning expert Alice Zheng takes you through the model evaluation basics.
In this overview, Zheng first introduces the machine-learning workflow, and then dives into evaluation metrics and model selection. The latter half of the report focuses on hyperparameter tuning and A/B testing, which may benefit more seasoned machine-learning practitioners.
With this report, you will:
Learn the stages involved when developing a machine-learning model for use in a software application Understand the metrics used for supervised learning models, including classification, regression, and ranking Walk through evaluation mechanisms, such as hold?out validation, cross-validation, and bootstrapping Explore hyperparameter tuning in detail, and discover why it’s so difficult Learn the pitfalls of A/B testing, and examine a promising alternative: multi-armed bandits Get suggestions for further reading, as well as useful software packages
Alice is a technical leader in the field of machine learning. Her experience spans algorithm and platform development and applications. Currently, she is a Senior Manager in Amazon's Ad Platform. Previous roles include Director of Data Science at GraphLab/Dato/Turi, machine learning researcher at Microsoft Research, Redmond, and postdoctoral fellow at Carnegie Mellon University. She received a Ph.D. in Electrical Engineering and Computer science, and B.A. degrees in Computer Science in Mathematics, all from U.C. Berkeley.
To be fair to a thin copy of such subject that can require much details study, the author has sufficiently managed to squeeze in the high-level framework into a book. She added some reading materials for those who are keen to learn more in-depth on certain topic.
She started from evaluating the metrics to choose which depends on the type of problem that currently being worked on. For example, classification should better be evaluated against the accuracy that the model churns while the precision/recall is more suitable for ranking.
The only part that I got a bit lost is when the writer said we should not confuse the model validation, cross validation, and hyperparameter tuning. As this concept of hyperparameter tuning is novel to me, I was expecting the next sentence to give me a definition of the term so that I can make a clear distinction out of them. Alas, the discussion of the hyperparameter starts at the next chapter. So probably better for the reader to keep go back and forth between the chapters to properly understand those three concept.
One more note from the machine learning newbie like me is from the specific statement on page 25: "Every new model needs to be evaluated on a separate dataset". The reasoning for this can be found on page 47: "If the tests are not independent (i.e., maybe your 32 models all came from the same training dataset?), ..." which sums up the importance of the tests independece.
This is excellent reference material for evaluating any machine learning model. Alice does a great job simplifying the evaluation process into its pros and cons. My only gripe is that the book does not stay consistent with teaching in a simple manner. It starts off simple but gets more complex as the book goes on.