Hadoop in Practice collects nearly 100 Hadoop examples and presents them in a problem/solution format. Each technique addresses a specific task you'll face, like querying big data using Pig or writing a log file loader. You'll explore each problem step by step, learning both how to build and deploy that specific solution along with the thinking that went into its design. As you work through the tasks, you'll find yourself growing more comfortable with Hadoop and at home in the world of big data.
“Hadoop in Practice” covers recipes/techniques for working with Hadoop. The 85 techniques range from pure Hadoop to related technologies like Mahout and Pig. There was good discussion of algorithms.
Java is definitely a pre-requisite. The book says you should have some knowledge of HDFS and MapReduce. Yet chapter one starts with “what is hadoop.” It reads better as a review than an intro and doesn't fit with the rest of the book. It also assumes you haven't installed/started Hadoop. You really should read an intro book first and skim chapter one.
I particularly liked the chapters on MapReduce and performance. The overview of iostat and vmstat was clear and better than in many UNIX books. I also liked the AST explain plan. The techniques about when to use joins and sorts seemed like they would be in “Hadoop in Action” as well. Yet the comparison of different types fit well.
Each chapter begins with a conceptual overview which was very useful. The book also contains many diagrams to add clarity.
Disclosure: I received a copy of this book from the publisher in exchange for writing this review.
First thing: This is not about how to deal with Hadoop in a real environment; this a cookbook of recipes for working with Hadoop, some of them that you won't ever use.
Second: The book uses a structure of "Explanation/Problem/Solution/Discussion". While the formula usually works, here is simply to add more words, because "Problem" is tailored exclusively to pair with the "Explanation". Not only that but "Solution" is basically a rehearsal of the "Explanation". Something like "Hadoop comes with it's own class for dealing with X file format; Problem: You have files in the X format and want to process them in Hadoop; Solution: Use the classes in Hadoop". This basically throw the whole structure under a bus.
Third: There is plenty of code examples, and most are terrible. I don't mean "The code doesn't compile" or "It doesn't follow any good practices". I mean it uses some cutesy arrows to point to some pieces of code, which means it's an image instead of a real code, which means you can't copy'n'paste if needed. Also, those arrows could be easily be converted to comments, except most comments would fall into the "i = i + 1; // increments i" category -- useless comments pointing to obvious things. If it would tell you why you're incrementing "i" instead of what it's doing, it would at least be interesting.
There may be something useful there if you have a specific problem with Hadoop. But if you have a single, specific problem, you'd Google it instead of buying a book with a bunch of other solutions that doesn't affect you.
I've read this book, but it's damn hard to rate it.
On one hand - it dives into some details I was totally not prepared to dive into and that was really impressive. I was absolutely impressed with the accuracy and level of detail provided in description of using Avro or Protobuf with Map/Reduce. It indeed seems very useful and I find it as an enabler to try something like that on my own.
But this books seriously requires more than some pracical experience from the reader. It's very hard to apply the knowledge gained if you've just done some basic work using Hadoop and Pig or Hive.
Why is that? Well, that's because of the biggest drawback of this book - it assumes you already have plenty of practical Map/Reduce scenarios and problems you've encountered in your professional career, so it doesn't bother much with real-life scenarios. You won't find many thorough RL situations described in this book. The ones that are presented are really quite simple (ranking referrers, finding Friends of Friends in social network, etc.). That's basically the main reason why I won't award this book with 5 stars.
But to be very honest, I have a feeling that I was not a perfect audience for this book. As its title claims, it's more for real Hadoop practicioners and my RL Hadoop experience seems to limited to benefit fully what I've just read - for instance: Crunch chapter wasn't really that obvious for RL scenario adaptation.