The versatile capabilities and large set of add-on packages make R an excellent alternative to many existing and often expensive data mining tools. Exploring this area from the perspective of a practitioner, Data Mining with Learning with Case Studies uses practical examples to illustrate the power of R and data mining. Assuming no prior knowledge of R or data mining/statistical techniques, the book covers a diverse set of problems that pose different challenges in terms of size, type of data, goals of analysis, and analytical tools. To present the main data mining processes and techniques, the author takes a hands-on approach that utilizes a series of detailed, real-world case With these case studies, the author supplies all necessary steps, code, and data. Web Resource A supporting website mirrors the do-it-yourself approach of the text. It offers a collection of freely available R source files that encompass all the code used in the case studies. The site also provides the data sets from the case studies as well as an R package of several functions.
Don't know what I was expecting, but I think this book was just not for me. Tons of examples of how to solve specific problems in R, but quite a few of the solutions require packages by the author of the book, which turns me off, because that's giving me a fish, rather than teaching me to fish.
This book has a number of very illustrative examples of analysis that one can do in R, and serves as a pointer to many additional resources. Unfortunately it is not complete in and of itself. It makes no attempt to explain the theory of the techniques being used, so I find myself having to go back to my old machine learning textbook and/or read papers to understand the background. Then I come back to this book after I understand the background, and it doesn't explain the specifics of how R implements the algorithm, either. So I then have to go read the R package documentation as well. Much of the book is occupied with examples of R code and output - there is some value in this in making sure one gets the expected results, but this is valuable only after only after one understands the principles.
This book has been a great way to practice data science in R with hands-on examples! This text is ideal for people who have a good grasp of R already, understand data manipulation, and have an understanding of different analytical approaches (classification, clustering, regression). For a beginner that does not know the premise or basic theory behind these techniques, I don't think this would be the best starting out book. For intermediate learners, expect to learn different ways to perform feature selection, apply/sapply/tapply, topic context of different fields that use statistics, and advanced user functions.
Data Mining With R (DMwR) promotes itself as a book hat introduces readers to R as a tool for data mining. It teaches this through a set of five case studies, where each starts with data munging/manipulation, then introduces several data mining methods to apply to the problem, and a section on model evaluation and selection. It fills a place in the literature since it devotes a lot of space for data manipulation before applying the various methods and model evaluation afterwards. But it is hard for people learning data mining since it spreads the types of model throughout the book.
I used this as one of two texts to teach data science to people whose programming and data analysis skills were generally at a very low level. The big advantage of using a programming environment such as R for data mining is the fact that you can do data manipulation in the language, then apply the methods. Many of my students have taken machine learning elsewhere, but they always used prepared data sets, so this emphasis on data manipulation with several very disparate data sets is a unique feature.
The second big advantage of this book is the focus on model selection. For each chapter, the book goes through the exercise of determining which model should be used, and how to diagnose the model to determine which one is appropriate and best for the problem. I especially appreciate the fact that in some cases, the conclusion of the book after model evaluation is that the method did not work for the problem and question at hand. Because most textbooks focus on demonstrating that you did find something, in some cases my students get confused when in real problems they did not find an effect.
Where the book is lacking is the fact that the methods are scattered across the case studies with minimal organization. While this is a result of the realities of the cases, the book would have benefited from a roadmap chapter or introduction that gave methodological context (i.e. what methodologies are used in the book and where they are). This lack made it very difficult to use as a textbook, and by the time I was done using it I was essentially building the roadmap to use the book. This makes it not useful as a standalone textbook for such a course, but very good if there is another text that gives the overview of the methodologies.
An unsatisfying sort of text book. It showed promise because it uses real data from relevant contexts to run through its examples. But it runs through the data mining examples without pausing to explain in more than the barest superficiality how the modelling and analysis methods demonstrated work, the logic behind them, or the assumptions they make about the data. I wouldn't have followed the random forest application if I hadn't already learned about random forests from a much better book, Introduction to Statistical Learning in R (available as a free PDF from the authors' website), and I learned nothing about artificial neural networks from the neural network example. To compund this, the authors make heavy use of their own functions (which you can download along with the book) instead of taking the time to show you how to achieve the same results using standard R tools and packages.
This book is great for practicing and learning R and getting a grasp of some data mining techniques. However, beware that this book does not cover the mathematical foundation behind the methods.
A good introduction to the subject, with many interesting examples to program yourself. It's been a while since I finished this, so I want to go re-visits some of the chapters.