Need to move a relational database application to Hadoop? This comprehensive guide introduces you to Apache Hive, Hadoop's data warehouse infrastructure. You'll quickly learn how to use Hive's SQL dialect--HiveQL--to summarize, query, and analyze large datasets stored in Hadoop's distributed filesystem.This example-driven guide shows you how to set up and configure Hive in your environment, provides a detailed overview of Hadoop and MapReduce, and demonstrates how Hive works within the Hadoop ecosystem. You'll also find real-world case studies that describe how companies have used Hive to solve unique problems involving petabytes of data.Use Hive to create, alter, and drop databases, tables, views, functions, and indexesCustomize data formats and storage options, from files to external databasesLoad and extract data from tables--and use queries, grouping, filtering, joining, and other conventional query methodsGain best practices for creating user defined functions (UDFs)Learn Hive patterns you should use and anti-patterns you should avoidIntegrate Hive with other data processing programsUse storage handlers for NoSQL databases and other datastoresLearn the pros and cons of running Hive on Amazon's Elastic MapReduce
This could have been a much better book had it not been for the apparent haste with which O'Reilly rushed it out the door before (really) doing a final edit. The book is riddled with typographical errors, my favorite being the "dangling" second paragraph of Chapter 17, "Storage Handlers and NoSQL", which ends with: "For example, a Hive query could be run that selects a data table that is backed by sequence files, however it could output" (no kidding).
The overall content is worthwhile, but you have been forewarned, it's not as well edited as other books from O'Reilly. Three stars, solely by content.
Really good book to get into Hive and dive deeper. The installation is somewhat outdated but mind you, this book is a few years old. And I'm on mac, which I think is still not officially supported. Trying to build something with hive is filled with uncertainty as I am never 100% sure if it fails because I'm not on Linux or because my queries are wrong. But still, great book to get into Hive. Can't wait for the second edition coming out early 2017.
Maybe 2.5 stars? Not as clear as other O'Reilly texts, and with a ton of mistakes, both in text and code snippets. Clearly a rush job. Still, it'll get you going in terms of being a *user* of Hive. If you want to be an administrator, I'd look to other sources - and make sure you have a solid Java background.