Get to grips with building robust XGBoost models using Python and scikit-learn for deployment
Key FeaturesGet up and running with machine learning and understand how to boost models with XGBoost in no timeBuild real-world machine learning pipelines and fine-tune hyperparameters to achieve optimal resultsDiscover tips and tricks and gain innovative insights from XGBoost Kaggle winnersBook DescriptionXGBoost is an industry-proven, open-source software library that provides a gradient boosting framework for scaling billions of data points quickly and efficiently.
The book introduces machine learning and XGBoost in scikit-learn before building up to the theory behind gradient boosting. You’ll cover decision trees and analyze bagging in the machine learning context, learning hyperparameters that extend to XGBoost along the way. You’ll build gradient boosting models from scratch and extend gradient boosting to big data while recognizing speed limitations using timers. Details in XGBoost are explored with a focus on speed enhancements and deriving parameters mathematically. With the help of detailed case studies, you’ll practice building and fine-tuning XGBoost classifiers and regressors using scikit-learn and the original Python API. You'll leverage XGBoost hyperparameters to improve scores, correct missing values, scale imbalanced datasets, and fine-tune alternative base learners. Finally, you'll apply advanced XGBoost techniques like building non-correlated ensembles, stacking models, and preparing models for industry deployment using sparse matrices, customized transformers, and pipelines.
By the end of the book, you’ll be able to build high-performing machine learning models using XGBoost with minimal errors and maximum speed.
What you will learnBuild gradient boosting models from scratchDevelop XGBoost regressors and classifiers with accuracy and speedAnalyze variance and bias in terms of fine-tuning XGBoost hyperparametersAutomatically correct missing values and scale imbalanced dataApply alternative base learners like dart, linear models, and XGBoost random forestsCustomize transformers and pipelines to deploy XGBoost modelsBuild non-correlated ensembles and stack XGBoost models to increase accuracyWho this book is forThis book is for data science professionals and enthusiasts, data analysts, and developers who want to build fast and accurate machine learning models that scale with big data. Proficiency in Python, along with a basic understanding of linear algebra, will help you to get the most out of this book.
Table of ContentsMachine Learning LandscapeDecision Trees in DepthBagging with Random ForestsFrom Gradient Boosting to XGBoostXGBoost UnveiledXGBoost HyperparametersDiscovering Exoplanets with XGBoostXGBoost Alternative Base LearnersXGBoost Kaggle MastersXGBoost Model Deployment
The book delivers exactly what it said: a hands-on approach to XGBoost. I really enjoy the fact that the book focus in the hyperparameter tuning. For me it is a 4.5 I deduct 0.5 because I was hoping that the difference between XGBoost and Gradient boosting would be more clearly explained. I had to use some external resource to understand it better. Nonetheless, as a hands-on book, it delivers what is supposes to
Excellent overview of XGBoost and tree/ensemble methods in general. Pedagogically, the author does a great job motivating the material and peeling layers to expose the advantages and shortcomings of the various algorithms.
I appreciate that all hyperparameters are clearly explained, and he even goes to great lengths to show the best practices to tune these in a systematic way. The 'Kaggle Masters' section is a nice bonus, and of great use to the more advanced users.
I subtract one star because even though the author culminates the book showing a 'full ML pipeline' with preprocessing, models, etc... he failed to show how to handle preprocessing predict files when the transformers need to save state, as is the case when scaling or mean-encoding, binning, etc. In these cases, the information gleaned from the train set needs to be saved/pickled on order to accurately encode the prediction records.
Absolute beginners, intermediate users looking for more advanced techniques, and those like me re-learning material are going to get the most out of this one.
Pretty good book to give overview and usage pattern on XGBoost as well as other tree and ensemble algorithms in scikit-learn. One of the parts that i enjoyed is the summary on how different various ensemble algorithms are as well as the concept of gradient boosting. The book contains a number of typos though, e.g. the formula for deriving the cost function for XGBoost, i had to cross-check the same equations on XGBoost website as a result.
A very accessible and well thought-out overview of XGBoost modelling in Python.
The flow of the book is excellent, first showing you how to setup your workspace and create some basic linear models and then gradually building up to XGBoost through separate chapters on decision trees, random forests, and gradient boosting. Once you reach XGBoost models, the book sticks there and deep-dives into all the hyperparameters that are included in the Python libraries. This approach worked exceptionally well for me and I'd say this book is the best resource I've found in terms of helping me understand what is going on under the hood of XGBoost.
The one big thing I'd like to have seen more of a focus on is feature engineering. There is half a chapter at the end of the book dedicated to this, and the author acknowledges in that chapter how important it is for achieving accurate models, but there's very little mention of it when you're building models throughout the rest of the book so it almost feels like an afterthought and leaves everything feeling a bit incomplete.
There are also a few typos throughout the book, something that seems to be quite common from Packt Publishing. It's a real shame as I've read multiple of their books now across a few different topics and in my experience they have been consistently great in terms of the actual content and the authors selected - they just need to start proofreading things before they publish.
Overall, I'd definitely recommend this book to somebody who is picking up XGBoost modelling for the first time, although you'd probably need some other materials dedicated to feature engineering to accompany it before really getting stuck in to building models yourself.
This is my second pass through the book so take this as part 2 mini-review. I went a bit further than I went in the first attempt. Wade discusses four tree-based models: decision trees, random forest, gradient boosting, and extreme gradient boosting (XGBoost) - more attention was paid to XGBoost. I will read this book if you want to get a quick, solid understanding of the hyperparameters for these tree-based models. Plus, it shows you neat ways of tuning the hyperparameters. Finally, there are so many nuggets to pick up along the way - data wrangling-wise.
The best simple explanation book with a practical example. Recommended for everyone who want to become familiar with XGBoost. It does not go in deep into mathematical equations but at the end you can handle your problem with it.
Builds up carefully from decision tree (DT) learning to bagging of DT into random forests, to gradient boosting, and then to XGBoost. Also, a very useful text for hyperparameter tuning.