Murray Cumming's Reviews > Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale
Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale
by
by

This grabbed my interest at first but suddenly became a terribly dull description of Java API and configuration file options. I made the mistake of deciding to plow stubbornly through this from cover to cover, but I eventually ended up skipping the most tedious parts in the middle sections.
I advise other people to just choose the most interesting parts from the overview:
https://www.safaribooksonline.com/lib...
Unfortunately, all the good stuff is heavily diluted with boring lists of API and configuration options that should really just be on an API reference page. Where there are (verbose Java) code examples, those examples are then tediously described in detail after the code, though brief comments in the code would be good enough.
The Related Projects chapter was particularly interesting, covering stuff like Spark, Hive, HBase, and ZooKeeper. It doesn't seem like people use MapReduce directly much these days. This suggests to me that I should have been reading some other book that didn't focus so much on the lower MapReduce layer.
I was also disappointed with the Case Studios section. Of the 3, one was really just another related project (Cascading). I'd like a wider view of how this stuff is used in the real world.
In summary: If it was available online as open source documentation, I'd understand this as a set of good URLs to use when answering mailing list questions. But it's not very readable as a book. Parts of it could be, as a smaller book.
I advise other people to just choose the most interesting parts from the overview:
https://www.safaribooksonline.com/lib...
Unfortunately, all the good stuff is heavily diluted with boring lists of API and configuration options that should really just be on an API reference page. Where there are (verbose Java) code examples, those examples are then tediously described in detail after the code, though brief comments in the code would be good enough.
The Related Projects chapter was particularly interesting, covering stuff like Spark, Hive, HBase, and ZooKeeper. It doesn't seem like people use MapReduce directly much these days. This suggests to me that I should have been reading some other book that didn't focus so much on the lower MapReduce layer.
I was also disappointed with the Case Studios section. Of the 3, one was really just another related project (Cascading). I'd like a wider view of how this stuff is used in the real world.
In summary: If it was available online as open source documentation, I'd understand this as a set of good URLs to use when answering mailing list questions. But it's not very readable as a book. Parts of it could be, as a smaller book.
Sign into Goodreads to see if any of your friends have read
Hadoop.
Sign In »