Discover key information buried in the noise of data by learning a variety of anomaly detection techniques and using the Python programming language to build a robust service for anomaly detection against a variety of data types. The book starts with an overview of what anomalies and outliers are and uses the Gestalt school of psychology to explain just why it is that humans are naturally great at detecting anomalies. From there, you will move into technical definitions of anomalies, moving beyond "I know it when I see it" to defining things in a way that computers can understand. The core of the book involves building a robust, deployable anomaly detection service in Python. You will start with a simple anomaly detection service, which will expand over the course of the book to include a variety of valuable anomaly detection techniques, covering descriptive statistics, clustering, and time series scenarios. Finally, you will compare your anomaly detection service head-to-head with a publicly available cloud offering and see how they perform. The anomaly detection techniques and examples in this book combine psychology, statistics, mathematics, and Python programming in a way that is easily accessible to software developers. They give you an understanding of what anomalies are and why you are naturally a gifted anomaly detector. Then, they help you to translate your human techniques into algorithms that can be used to program computers to automate the process. You’ll develop your own anomaly detection service, extend it using a variety of techniques such as including clustering techniques for multivariate analysis and time series techniques for observing data over time, and compare your service head-on against a commercial service.
What You Will Learn Who This Book Is For
For software developers with at least some familiarity with the Python programming language, and who would like to understand the science and some of the statistics behind anomaly detection techniques. Readers are not required to have any formal knowledge of statistics as the book introduces relevant concepts along the way.
This book is well written; I appreciate both the author's clarity and his sense of humor. Unfortunately, this book is not really about anomaly detection. Rather, it is a walkthrough for data scientists on how to write their first ever Python service, and it happens to use anomaly detection as the demonstration case.
Only about 1/3 of the book actually explains anomaly detection concepts and techniques. As an example of what I mean, consider Chapter 7. This chapter covers Grubbs' test (2 pages), the Generalized ESD Test (1 page), and Dixon's Q Test (2 paragraphs). It then spends 8 pages showing in detail and discussing the Python code to invoke these tests, and updates to the unit tests and integration tests. This chapter is not an, ahem, anomaly. In chapter 14 there are 4 pages of explanation of changepoint detection, and then 11 pages of Python implementation, test suite updates, and web server updates.
I do believe there is a need for a good practitioner's book on anomaly detection. For now, I would continue to recommend Outlier Analysis instead until this need is filled.