Apache Mahout Cookbook uses over 35 recipes packed with illustrations and real-world examples to help beginners as well as advanced programmers get acquainted with the features of Mahout."Apache Mahout Cookbook" is great for developers who want to have a fresh and fast introduction to Mahout coding. No previous knowledge of Mahout is required, and even skilled developers or system administrators will benefit from the various recipes presented
Just recently finished reading Apache Mahout Cookbook so I can say a thing or two about it now.
The first thing to notice is that the introduction covers in detail how to install and run Mahout. Also, it gives a kinda shallow overview of Hadoop that the reader, in my opinion, should already know about. Skipping that might make some of file manipulation and steps look like magic. Overall, it gives enough to start off with your very first Mahout run.
Chapter 3 covers some info on how to import data to Hadoop and Mahout from external sources. Again, nothing really special and that’s the stuff you would be able to google out in a few mins but it is very convenient for a reader who might not know what to google for :).
Chapters 1 to 3 focused on setup and infrastructure in quite a detailed way so probably most first time Linux/Hadoop/Mahout users will find that VERY useful. More experienced Hadoop or Linux users might want to skip that right away. The rest of the book covers actual Machine Learning stuff for Mahout.
Regarding Machine learning algos presented in the book, I doubt that there can be anything special said about them. They are standard approaches used widely in industry. What makes their presentation in the book nice is that Java based and command line ways to do that are always shown. That’s very convenient. What I also found very nice, was references to papers from some of the approaches which are very useful for more advanced users who might want to read about the method in depth.
What I didn’t like in the book, were overly detailed examples like a wizard for class creation. Even more frustrating, this was presented in screenshots multiple times and I find that just a waste of space. It would have been better if that was just described somewhere in an appendix. I would expect that even beginners with Mahout should be able to create a class by themselves without any instructions.
Some of the code isn’t really presented nicely and becomes hard/annoying to read.
To conclude, I found this book useful and helpful. It might not reveal the details behind the Machine Learning approaches used but that isn’t really a goal of the book (actually to do that, you would need to buy a few 500 page books). Recipes themselves in the book I found clear and useful, though sometimes more practical examples could be given.
TL;DR Book is good for an aspiring Mahout user. Some irritating stuff along the way but explanations are good and examples are ready to use - hence cookbook. Not a book to cover machine learning in depth.
Definitely it is a good book with very interesting examples. The code in Java is easy to follow.
This is a book about practical examples with Mahout, and as the author notes, his intention was not to teach the theory of the machine learning algorithms.
To know more in detail about the algorithms and understand more about machine learning you will have to follow the external references and papers that the author linked in each chapter.
The first chapter deals with the environment and setup.
The setup is very concise and clear. I downloaded more recent versions of the libraries indicated in the book and the same installation steps were still valid. The versions I ended up using were:
NetBeans IDE 8.0 Java 1.7.0_51 Maven 3.0.5 Hadoop 0.23.5
The current version of Mahout is 0.9 and in that version they removed the SlopOneRecommender, so the first example doesn’t work if you checked out trunk. It is better to check out the previous version 0.8 to follow the examples and code from the book. It is really nice to finish the first chapter running a real example.
In chapters 2 and 3 the author explores the storage options we have to load and save our datasets. This is done using sequential files as we are used in Hadoop or using Sqoop to import data from DB into the Hadoop file system.
From chapter 4 the fun starts with an example of a machine learning algorithm in each chapter.
I like the approach of using first the algorithm from command line with predefined parameters and then reproduce it later with a more detailed implementation in Java.
One of the nice things about those examples is the datasets. Those come from real life data. In particular I liked the stock market forecasting from chapter 5 and the spectral clustering of medical images from chapter 7. This last one showing the author experience in that field.
What I didn’t like about the book are 2 things, first the layout for the code was not that good, making the code hard to read when wrapping lines. The second one is the typeface for the mathematical equations which don’t look good.
In summary, It’s been a very entertaining book with realistic and practical examples.
This book is good for beginners with each step explained with screenshots and step by step in details
How to do it sections are really impressive.
It presents a detail idea about Apache Mahout & Hadoop. Best thing is that it gives the working knowledge with hadoop and mahout instead of dull and boring documentation given at other places.
With the given instruction I was able to setup my single node hadoop cluster with mahout in action :)