Learn how to write, tune, and port SQL queries and other statements for a Big Data environment, using Impala―the massively parallel processing SQL query engine for Apache Hadoop. The best practices in this practical guide help you design database schemas that not only interoperate with other Hadoop components, and are convenient for administers to manage and monitor, but also accommodate future expansion in data size and evolution of software capabilities. Written by John Russell, documentation lead for the Cloudera Impala project, this book gets you working with the most recent Impala releases quickly. Ideal for database developers and business analysts, the latest revision covers analytics functions, complex types, incremental statistics, subqueries, and submission to the Apache incubator. Getting Started with Impala includes advice from Cloudera’s development team, as well as insights from its consulting engagements with customers.
Librarian Note: There is more than one author with this name in the Goodreads database. This profile contains books by more than one author with this name.
A small but fun view of impala - missing the admin bits though…
Getting started with Impala is a small but interesting book.
In this case “small book” = 96 pages.
The first half introduces Impala (how to import data, query data, format data etc).
The second half provides a series of walk-through tutorials, putting impala into practice.
You’ll do better if you have knowledge of SQL and Hadoop, but you can probably survive with just the basics.
The book is interesting , it does a nice job of pointing out best practices and possible traps etc. It’s an easy read and at several points you’re struck with a “I should remember that gem of info” feeling.
However, I would have expected better from someone who is leading a team providing documentation for Impala. Don’t get me wrong, what is there is good, but I can’t help but feel I have been given half a book, whole chunks are missing. The coverage of Admin aspects is totally missing, the same is true for how Impala fits into the Hadoop ecosystem. An extra 30-50 pages could have made this a really super little book.
The Good: Will help you understand how to use Impala, tips, tricks etc, how to structure your data and manipulate it.
The Bad: There is basically nothing here for an admin to get their hands on.
Would I recommend it: I’m a bit mixed here. If you consider the pure development angle then it’s got a lot going for it. However, if you are more DevOps and have some operations considerations then you’re going to feel like a whole chunk of book is missing.
So yes if you are a user, prob not if you are an admin (read the cloudera docs online).
I’m going to give the book 3 starts, not because of lack of quality in what is there but because of the parts which are simply missing.
I would say this is good mediocre book for managers to give them an overview of the product. It's very basic and most part of it is about running various simple queries and looking at the result. There is a chapter dedicated to performance tunning. But don't think that the book is worth buying because of it (it's not cheap and has only 105 pages) If you're a data engineer you should probably google for "Impala: A Modern, Open-Source SQL Engine for Hadoop" white paper from CIDR conf site.
Not comprehensive enough, but serves as a good starting point. The author could cover code examples for say the user defined functions and more examples with technologies as Pig, Hbase, Hive etc. could be very helpful.