Do you want to understand your models and mitigate risks associated with poor predictions using machine learning (ML) interpretation? Interpretable Machine Learning with Python can help you work effectively with ML models.
The first section of the book is a beginner's guide to interpretability, covering its relevance in business and exploring its key aspects and challenges. You'll focus on how white-box models work, compare them to black-box and glass-box models, and examine their trade-off. The second section will get you up to speed with a vast array of interpretation methods, also known as Explainable AI (XAI) methods, and how to apply them to different use cases, be it for classification or regression, for tabular, time-series, image or text. In addition to the step-by-step code, the book also helps the reader to interpret model outcomes using examples. In the third section, you’ll get hands-on with tuning models and training data for interpretability by reducing complexity, mitigating bias, placing guardrails, and enhancing reliability. The methods you’ll explore here range from state-of-the-art feature selection and dataset debiasing methods to monotonic constraints and adversarial retraining.
By the end of this book, you'll be able to understand ML models better and enhance them through interpretability tuning.
Serg Masís has been at the confluence of the internet, application development, and analytics for the last two decades. Currently, he's a Climate and Agronomic Data Scientist at Syngenta, a leading agribusiness company with a mission to improve global food security. Before that role, he co-founded a search engine startup, that combined the power of cloud computing and machine learning with principles in decision-making science to expose users to new places and events efficiently. Whether it pertains to leisure activities, plant diseases, or customer lifetime value, Serg is passionate about providing the often-missing link between data and decision-making — and machine learning interpretation helps bridge this gap more robustly. His book Interpretable Machine Learning with Python was published by UK-based publisher Packt in April, 2021.
A wonderful cutting edge book which explains the latest technologies and algorithms. Event though the neural network section is very advanced it is well written and accessible.
Fairness: are predictions made without bias? Accountability: can we trace their predictions reliably back to something or someone? Transparency: can we explain how and why predictions are made? Interpretability is the extent to which humans, including non-subject matter experts, can understand the codes and effect and input and output, of the machine learning model. To say your model has high level of enterprise ability means you can describe in a human interpretable way its inference. Explainability encompasses everything interpretability is. It goes deeper on the transparency requirement because he demands human friendly explanations for the models inner workings and the model training process and not just model inference. - Model transparency - Design transparency - Algorithmic transparency Opaque models are due to: - Not statistically grounded - Uncertainty and non-reproducibility - Overfitting and the curse of dimensionality
White Box Algorithms Linear Regression Normality is the property that each feature is normally distributed. No normality can be corrected with non-linear transformation, if a feature isn’t normally distributed it will make its coefficient confidence intervals invalid. Independence: observation are independent of each other like a different and unrelated events. Lack of multicollinearity is desirable otherwise you’d have inaccurate coefficient. Homoscedasticity: when the residuals are more or less equal across the regression line. If you’re going to use the Lena regression heavily we need to test these assumptions before fitting the data. They intercept is not a feature, its meaning is if all features where at 0, what would the prediction be? In practice did this doesn’t happen unless your features happened to all have a plausible reason to be 0.
Ridge Regression It is part of penalized or regularized regression family. It is called sparse linear model because thanks to the regularization it cuts out some of the noise by making irrelevant features less relevant.
Polynomial Regression It is a special case of linear or logistic regression where every feature is expanded to have higher degree terms and interactions between all the features. It is still a linear regression in every way except it has extra features, higher degree terms, and interactions.
Logistic Regression It is expressed by a logistic function that involves exponentials of the linear combination of the coefficient and the features. The presence of the exponentials explains why the coefficient extracted from the model are log-odds because to isolate vehicle efficient you should apply a logarithm to both sides of the equation. To interpret each coefficient, is the same as with Linear regression, except each unit increase in the features, you increase the odds of getting the positive case by a factor expressed by the exponential of the confusion, all things being equal a.k.a. ceteris paribus. There is no consensus on how to get Features importance yet.
Decision Trees They have been used for the longest time, even before they were turned into algorithms.
[Neural Networks do not have the option to capture features importance]
XGBoost It implements gradient boosted decision trees, an ensamble method.
SHapley additive exPlanations SHAP It is a collection of methods or explainers that approximate Shapley values, its value is the average of these contributions over many simulations. You have a full coalition with all your feature sand you have all the possible subsets of the features minus the feature you are evaluating. The contribution of a feature a.k.a. its pay-off, is a reduction in predictive error for regression or an increase in probability for classification. The computation time grows exponentially as features increase, that is why we should sample some of the possible subsets of features using Monte Carlo sampling, which randomly samples from a probability distribution.
Support Vector Classifiers SVC SVM is a family of model classes that operate in high dimensional space to find an optimal hyperplane when they attempt to separate the classes with a maximum margin between them. Support vectors are the points closest to the decision boundary that would change it if were removed. They tend to work effectively and efficiently when there are many features compared to the observations but SVM is not as scalable to larger data sets and it’s hard to tune its hyperparameters.
Global Surrogates A white-box model that you train with the black-box models’ predictions.
Permutation feature importance It is a model agnostic method that can be used with the unseen data and it tells you what the model thinks is important according to what was learned from the training data, but it cannot to tell you what is most important once you introduce unseen data. Its main disadvantage is that it won’t pick up on the impact of features correlated with each others, that is multicollinearity will trump feature importance.
Partial Dependence Plot PDP conveys the marginal effect of a feature on the prediction throughout all possible values for that feature. It’s a global modern interpretation method that can visually demonstrate the impact of a future and the nature of the relationship with the target. Its main disadvantages are that it can only display up to 2 features at a time and it assumes independence of features when they might be correlated with each other.
Features Selection The advantages of selecting a smaller subset of features: easier to understand simpler models; shorter training time; improve the generalization by reducing overfitting because sometimes with little production value many of the variables are just noise and their ML model learned from this noise and triggers overfitting.
Bias Mitigation Feature engineering; balancing or resampling; re-labelling or massaging; reweighing; disparate impact remover; Prejudice remover regolizer; exponentiated gradient reduction.
Tuning Hyperparameters - Regularization - Iterations - learning rate - early stopping - class imbalance - sample weight
Very comprehensive book on the current sota methods and open source libraries for interpretation ML. The book should equip user with introductory knowledge and confidence on exploring iML which is something pretty frustrating for real world ML applications
this book helps me analyze the data by accounting for the the intricacies of interpretable AI to the next level. gonna be really helpful for the projects I am working on. Explaining AI output to the non-technical people so that it could be implemented for broader audiences in an understandable fashion.