What do

*you*think?Rate this book

This book describes the important ideas in a variety of fields such as medicine, biology, finance, and marketing in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of colour graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry.

745 pages, Kindle Edition

First published January 1, 2001

Create a free account to discover what your friends think of this book!

Displaying 1 - 30 of 58 reviews

Download PDF at http://www-stat.stanford.edu/~tibs/El...

February 16, 2013

Excellent book. Has repaid multiple rereadings and is a wonderful springboard for developing your own ideas in the area. Currently I'm going through Additive Models again which I breezed by the first few times. The short section on the interplay between Bias, Variance and Model Complexity is one of the best explanations I've seen.

After retiring, I developed a method of learning a variation of regression trees that use a linear separation at the decision points and a linear model at the leaf nodes (and subsequently used them to forecast the behavior of hurricanes). In them I used a heuristic measure for growing and shrinking the trees, but thanks to this book I can see there is a theoretically sound basis for the measure. Which is nice.

If you have the mathematical background (standard calculus, linear algebra and some familiarity with statistical notation) this is a wonderful introduction to Machine Learning and covers most, but not all, of the major models in use today. The Second Edition does not cover recent topics such as many of the deep learning schemes, but you wouldn't expect it to. More generally, some of the exposition of ideas is very compact. I would have really welcomed more explanation on the relationship between Projection Pursuit Regression and neural networks.

I could have also wished for a bit more linkage between some topics. For example, k-nearest neighbor and standard tree-based classifiers and regression schemes both depend on the strategy of dividing the universe into smaller areas that are presumed to be sufficiently homogeneous that the simplest possible model for that area is completely sufficient. As such, many of the same techniques for improving one can be used for improving the other. Random Forests for example would have as an analogy Random Compact Neighborhoods or Random Prototype Collections with some of the same advantages. As another example, the coverage of PRIM, the bump hunting algorithm, is excellent and the only real coverage of the topic I've seen, but then gives the rather vague step of using cross-validation to select a particular box from the sequence and never once mentions the obvious relationships to either dimensional considerations or the measures considered in the chapter on unsupervised Learning, and this last oversight verges on criminal.

Because the radically different assumptions underlying global models such as linear regression and additive models which also inherently assume independence between parameters, and local based models such as regression trees and K-nearest neighbors, with PRIM solidly in the middle and excellent at picking up local parameter interaction, I'm thinking that my next set of experiments will be with a multipronged approach, first doing the best job one can with a global model, following that with a through job of bump hunting (but both high and low boxes unlike PRIM) to pick up the local parameter interactions and then see if there are any pieces left for nearest neighbor or regression/classification-tree methods to pick up. Following those experiments, I'm thinking of a generalization of an artificial neuron which instead of being a hard-limiting non-linearity applied to a linear model would be a hard-limiting (or soft-limiting as the case may be) non-linearity applied to an additive-model. In all of these investigations I expect Elements of Statistical Learning to be a constant companion.

This book usually seems to be relegated second in the Machine Learning area after Bishop's Pattern Recognition and Machine Learning, but I would put it first.

(edited for spelling)

After retiring, I developed a method of learning a variation of regression trees that use a linear separation at the decision points and a linear model at the leaf nodes (and subsequently used them to forecast the behavior of hurricanes). In them I used a heuristic measure for growing and shrinking the trees, but thanks to this book I can see there is a theoretically sound basis for the measure. Which is nice.

If you have the mathematical background (standard calculus, linear algebra and some familiarity with statistical notation) this is a wonderful introduction to Machine Learning and covers most, but not all, of the major models in use today. The Second Edition does not cover recent topics such as many of the deep learning schemes, but you wouldn't expect it to. More generally, some of the exposition of ideas is very compact. I would have really welcomed more explanation on the relationship between Projection Pursuit Regression and neural networks.

I could have also wished for a bit more linkage between some topics. For example, k-nearest neighbor and standard tree-based classifiers and regression schemes both depend on the strategy of dividing the universe into smaller areas that are presumed to be sufficiently homogeneous that the simplest possible model for that area is completely sufficient. As such, many of the same techniques for improving one can be used for improving the other. Random Forests for example would have as an analogy Random Compact Neighborhoods or Random Prototype Collections with some of the same advantages. As another example, the coverage of PRIM, the bump hunting algorithm, is excellent and the only real coverage of the topic I've seen, but then gives the rather vague step of using cross-validation to select a particular box from the sequence and never once mentions the obvious relationships to either dimensional considerations or the measures considered in the chapter on unsupervised Learning, and this last oversight verges on criminal.

Because the radically different assumptions underlying global models such as linear regression and additive models which also inherently assume independence between parameters, and local based models such as regression trees and K-nearest neighbors, with PRIM solidly in the middle and excellent at picking up local parameter interaction, I'm thinking that my next set of experiments will be with a multipronged approach, first doing the best job one can with a global model, following that with a through job of bump hunting (but both high and low boxes unlike PRIM) to pick up the local parameter interactions and then see if there are any pieces left for nearest neighbor or regression/classification-tree methods to pick up. Following those experiments, I'm thinking of a generalization of an artificial neuron which instead of being a hard-limiting non-linearity applied to a linear model would be a hard-limiting (or soft-limiting as the case may be) non-linearity applied to an additive-model. In all of these investigations I expect Elements of Statistical Learning to be a constant companion.

This book usually seems to be relegated second in the Machine Learning area after Bishop's Pattern Recognition and Machine Learning, but I would put it first.

(edited for spelling)

June 15, 2016

Well, it was one of the most channeling books I've read in my career. It is a rigorous and mathematically dense book on machine learning techniques.

Be sure to refine your understanding of linear algebra and convex optimization before reading this book. Nonetheless, the investment will totally worth it.

Be sure to refine your understanding of linear algebra and convex optimization before reading this book. Nonetheless, the investment will totally worth it.

February 23, 2008

This book surveys many modern machine learning tools ranging from generalized linear models to SVM, boosting, different types of trees, etc.

The presentation is more or less mathematical, but the book does not provide a deep analysis of why a specific method works. Instead, it gives you some intuition about what a method is trying to do. And this is the reason I like this book so much. Without going into mathematical details, it summarizes all necessary (and really important) things you need to know. Sometimes you understand this after doing a lot of research in that subject and coming back to the book. Nevertheless, the authors are great statisticians and know what they are talking about!

A word of caution: I am not sure if this is a good book for self-study if you don't have any background in machine learning or statistics.

The presentation is more or less mathematical, but the book does not provide a deep analysis of why a specific method works. Instead, it gives you some intuition about what a method is trying to do. And this is the reason I like this book so much. Without going into mathematical details, it summarizes all necessary (and really important) things you need to know. Sometimes you understand this after doing a lot of research in that subject and coming back to the book. Nevertheless, the authors are great statisticians and know what they are talking about!

A word of caution: I am not sure if this is a good book for self-study if you don't have any background in machine learning or statistics.

April 24, 2018

I read this book for work, during work, but I'm falling behind my yearly goal so I'm including it on goodreads :P

This book has a lot in it, and is incredibly dense. However, it's well worth it. It contains not quite everything about statistics and machine learning that someone needs to know to do data science, but it comes close.

The drawback is that this book is hard to understand. You need to know a lot, or be willing to learn a lot from other resources, to actually get a lot from this book. Even as someone with a good stats and ML background, there were some parts where I had to find online sources to get explanations of even how to start thinking about what's in the book.

Now that I've gone through it once, I know I'll be going back to this time and time again since it is such a good resource. I also plan on going back and re-reading at least some of the chapters as necessary.

This book has a lot in it, and is incredibly dense. However, it's well worth it. It contains not quite everything about statistics and machine learning that someone needs to know to do data science, but it comes close.

The drawback is that this book is hard to understand. You need to know a lot, or be willing to learn a lot from other resources, to actually get a lot from this book. Even as someone with a good stats and ML background, there were some parts where I had to find online sources to get explanations of even how to start thinking about what's in the book.

Now that I've gone through it once, I know I'll be going back to this time and time again since it is such a good resource. I also plan on going back and re-reading at least some of the chapters as necessary.

June 13, 2017

For the mathematician - this book is too terse and hard to learn from to the point of pretentiousness.

For the software engineer - the algorithms presentation in this book is poor. A bunch of phrases with no clear state change, step computations, etc.

In general - a lot of pompous presentations and hand waiving material.

Something positive: the paper is top quality.

For the software engineer - the algorithms presentation in this book is poor. A bunch of phrases with no clear state change, step computations, etc.

In general - a lot of pompous presentations and hand waiving material.

Something positive: the paper is top quality.

October 5, 2013

A classic text in machine learning from statistical perspective. No matter you're a novice machine learning practitioner, undergrad or hardcore PhD you can't miss out on this one. Overall, a good nontrivial broad intro to machine learning without loss of technical depth.

December 10, 2011

An extremely well-written introduction to machine learning. I now understand why this is the universal textbook for machine learning classes.

The math is described at a reasonably high level, but the authors do a fantastic job emphasizing the conceptual differences between different learning algorithms. A major focus of this text is on conditions which favor some algorithms over others in minimizing variability for different learning exercises. While this book is not a very pragmatic text (does not hold your hand through implementation), it does a fantastic job laying conceptual foundations. I highly recommend this to any student serious in statistical thinking.

The math is described at a reasonably high level, but the authors do a fantastic job emphasizing the conceptual differences between different learning algorithms. A major focus of this text is on conditions which favor some algorithms over others in minimizing variability for different learning exercises. While this book is not a very pragmatic text (does not hold your hand through implementation), it does a fantastic job laying conceptual foundations. I highly recommend this to any student serious in statistical thinking.

December 13, 2020

This book has been the referential authority for current users of supervised and unsupervised ML. Having already an econometrics and probability background, this book was quite accessible and enjoyable to read. I appreciate the methodical and careful style, though at times it feels terse. I guess the reason is that the book is already quite long and is not meant to be a deep dive into methodology or theory. That said, the book is very good as an introduction and a reference to ML methods. I think a semester course using this book should be part of the standard graduate curriculum in economics.

June 28, 2020

A more detailed companion piece to the introductory ISLR, this is an excellent introduction. The only critique would be that, it is too even-handed to influence the mindset of the reader much.

July 19, 2015

It's a classic, but it's not my favorite text at this level for either teaching or self-study. Coverage of core methods is relatively good, but the content sometimes veres between highly mathematical and formulaic, missing important conceptual areas. I wouldn't consider a statistics/ML/bioinformatics/... library complete without ESL, but I think Pattern Recognition and Machine Learning is a better overall resource and aid to teaching this content.

November 13, 2020

Although covering wide range of topics, the book, especially towards the end, reads as a thick overview article, rather than a textbook. Yes, there're many problems to work on at the end of any chapter, but most concepts, ideas and algorithms presented would require the reader to refer to "original papers" if he attempts to implement them in computer code. So, while theoretically informative, the book is seriously lacking on practical level. More of a review than a reference.

February 15, 2021

Note that somehow the Kindle Edition is not associated with all the other editions of this book in the GoodReads database. See the rest of them at Elements of Statistical Learning

May 28, 2017

Nice as a reference or an overview, but not necessarily as a source for learning. So many approaches and techniques are described in this book, that out of necessity, their description is very general, very condensed and very mathematical.

February 22, 2019

Creo que é o primeiro caso no que o goodreads me axuda coa lectura.

É un libro que comecei en 2016, tiña impreso, coas súas 600 páxinas de fotocopias nun caixón de tela do IKEA coa botas de basket e negro futuro pero vida sen sobresaltos. E cada vez que entraba aquí pois lembrábame del e acabeino esta semana.

Está moi ben escrito, parécese aos apuntes de María Merlán da xeración do 2005 de Teleco. Conciso, explica todo, brevemente, sen matemáticas novas, estatística básica, boa coa que chegas a todos os conceptos. Non se sobresae en ningún punto, é o ceo da xente que traballa a modo e sistemáticamente ( Por exemplo cando explica ADABoost, usa tres liñas de texto para explicalo e con iso chégache para entrar á ilustración da fórmula do erro do algoritmo, SUPER práctico )

A min valeume para entender que non quero adicarme ao Machine Learning. Non capta a miña atención en absoluto, é estatística con ínfulas, creo que ten bias que van facer encallar a esta técnica en problemas complexos como a validación de algoritmos de condución autónoma. Creo que é imposíbel discernir a influencia dos datos dentro do proceso de adestramento dun detector. Un detector de peóns será sempre moi dependente dos peóns nos vídeos é pura teoría de vectores. E non me interesa, perdinme moita información por lelo sen interese. Que mágoa, bueno, grazas goodreads. É ser constante na vida realmente tan útil? xD

É un libro que comecei en 2016, tiña impreso, coas súas 600 páxinas de fotocopias nun caixón de tela do IKEA coa botas de basket e negro futuro pero vida sen sobresaltos. E cada vez que entraba aquí pois lembrábame del e acabeino esta semana.

Está moi ben escrito, parécese aos apuntes de María Merlán da xeración do 2005 de Teleco. Conciso, explica todo, brevemente, sen matemáticas novas, estatística básica, boa coa que chegas a todos os conceptos. Non se sobresae en ningún punto, é o ceo da xente que traballa a modo e sistemáticamente ( Por exemplo cando explica ADABoost, usa tres liñas de texto para explicalo e con iso chégache para entrar á ilustración da fórmula do erro do algoritmo, SUPER práctico )

A min valeume para entender que non quero adicarme ao Machine Learning. Non capta a miña atención en absoluto, é estatística con ínfulas, creo que ten bias que van facer encallar a esta técnica en problemas complexos como a validación de algoritmos de condución autónoma. Creo que é imposíbel discernir a influencia dos datos dentro do proceso de adestramento dun detector. Un detector de peóns será sempre moi dependente dos peóns nos vídeos é pura teoría de vectores. E non me interesa, perdinme moita información por lelo sen interese. Que mágoa, bueno, grazas goodreads. É ser constante na vida realmente tan útil? xD

November 21, 2022

I liked the book when I read the first few chapters in 2017, so I decided to reread and finish it this year. But the second time is not the charm.

First, the book is too theoretical for non-academic readers. Sometimes, the theories can be useful. For example, I learned about the relationship between different models: LASSO can be thought as Bayesian regression with a Laplacian prior, and trees can be thought of special cases of regressions with basis functions, etc. But most of the time, I can't figure out how the theories would help me build a better machine learning model.

The book occasionally provides practical advice about model training (e.g., the best strategy for training GBM is to set the learning rate small and use early stopping to select the number of trees). But I doubt whether reading this book is the most efficient way to learn about such suggestions.

I also find the book poorly organized. I cannot understand the logic behind the ordering of chapters.

My recommendation for ML-application-focused readers is: read chapter 2 (which provides a great explanation of bias-variance tradeoff) only, and treat the rest of the book as reference.

First, the book is too theoretical for non-academic readers. Sometimes, the theories can be useful. For example, I learned about the relationship between different models: LASSO can be thought as Bayesian regression with a Laplacian prior, and trees can be thought of special cases of regressions with basis functions, etc. But most of the time, I can't figure out how the theories would help me build a better machine learning model.

The book occasionally provides practical advice about model training (e.g., the best strategy for training GBM is to set the learning rate small and use early stopping to select the number of trees). But I doubt whether reading this book is the most efficient way to learn about such suggestions.

I also find the book poorly organized. I cannot understand the logic behind the ordering of chapters.

My recommendation for ML-application-focused readers is: read chapter 2 (which provides a great explanation of bias-variance tradeoff) only, and treat the rest of the book as reference.

June 1, 2021

I loved this book, its presentation is very nice, and the topics are very well reviewed, with beautiful examples and figures, always trying to unify the view towards the same basic linear models. It is a little biased towards Lasso because of the authors, but it is actually a good thing, as they present the intuition of many methods. It also makes clear distinction when hard mathematical details are to be presented, and have good exercises to understand this details better. The only thing that could be improved is that the context of some specific topics is not well understood, regarding why it is in a place of the book and no other, because many topics are related, and it is difficult to make the connections. I would really recommend this book as an introduction to machine learning, specially because of the intuitive explanations, and the examples.

November 21, 2018

This is an excellent second or third book on statistical modeling, after you have read something with code examples and done a few real projects. It is mathematically deeper and more comprehensive than An Introduction to Statistical Learning: With Applications in R and does more to tie together how and why algorithms work. It provides no code examples, and it is also correspondingly more demanding in the mathematical background of the reader. Even if you never read all of it, it's worthwhile owning as a reference, and a PDF is even available for free from the author: https://web.stanford.edu/~hastie/Elem...

December 31, 2018

it's the classic for good reason, well written and well organized, but this field is not as magical as people believe. And decorating machine-learning books with informative, colorful, frequent pictures is absolutely what mathematical educators everywhere should be doing, but unfortunately it's only the intellectually vacuous computer fields that ever seem to stick enough pretty pictures in their books.

I would like to say machine learning won't make you the money you think it will, but sadly it does make people money---just for the wrong reasons.

I would like to say machine learning won't make you the money you think it will, but sadly it does make people money---just for the wrong reasons.

August 31, 2019

Plenty of pictures. But the field is bullshit. Picture-heavy books like this are wonderful _except_ that then hundreds of pages are spend making it look like a thing which shouldn’t actually be considered a thing, is actually a thing.

It’s far better laid out than stuffy academic journal articles, yet as irrelevant as a stuffy academic journal.

Buy and read this if you’re a math student and want some pictures and examples of "how polynomials might apply to the real world". Buy and read it if you’re a programmer who wants to learn some statistics.

Just don’t take it too seriously.

It’s far better laid out than stuffy academic journal articles, yet as irrelevant as a stuffy academic journal.

Buy and read this if you’re a math student and want some pictures and examples of "how polynomials might apply to the real world". Buy and read it if you’re a programmer who wants to learn some statistics.

Just don’t take it too seriously.

November 4, 2019

A clear and not-so-heavy on the math side introduction to Data Science and Statistical Learning.

I did not finish the book on its entirety since I already was versed in some of the topics. Notwithstanding, even in such situations, a quick glance gave me more intuition and nuance regarding to what I already knew.

I also learned a lot of new concepts, every Data Scientist should read this book.

I did not finish the book on its entirety since I already was versed in some of the topics. Notwithstanding, even in such situations, a quick glance gave me more intuition and nuance regarding to what I already knew.

I also learned a lot of new concepts, every Data Scientist should read this book.

November 20, 2020

I love this book. It’s been my constant fallback last couple of years. Whenever a question sprung up in my head about the fundamentals of an algorithm, ESL was there with just the precise, succinct information I needed. I normally don’t write reviews for textbooks, but this one had to be done. I owe one to ESL.

January 13, 2022

This remains a great reference book for machine learning, although the chapter on neural networks has become very dated. I wouldn't recommend it as a textbook or for self-study though. A lot of things are in the wrong order, and you'll frequently find yourself having to refer to later chapters in the first half of the book.

September 14, 2022

Concise but not obscure. Comfortable but not facile. Ideal for quick-witted generalists. Authors are a little sloppy with notation, but that seems to be de rigeur with Stanford statistical learning professors.

December 22, 2022

what cs229 should be. important contents: curse of dimensionality, linear regression vs. k-nn. Frameworks for bias v. variance. chapters on cross-validation and methods for estimating true error are backbone for backtesting

June 15, 2017

Everyone in machine learning area should read it.

December 17, 2019

Amazing read for anyone who is interested in Data Science. The chapters are all very well written.

April 17, 2020

Rigorous and mathematically dense books for machine learning. One of the most challenging books I’ve ever read.

May 5, 2020

Best book on data science ever.

May 15, 2020

good reference text. Selective reading.

Displaying 1 - 30 of 58 reviews