It's notoriously difficult to query Hadoop data using standard Map/Reduce programming techniques. Pig and the Pig Latin scripting language provide a SQL-like platform that simplifies query construction against data sets in Hadoop, eases the obstacle of Map/Reduce, and opens the door to processing large data sets for casual users, including experimentation on data sets. And it stands up well under stress—Yahoo uses Pig for over half the queries it runs on the world's largest Hadoop cluster.
Pig in Action introduces Pig and the Pig Latin language while teaching the fundamentals of big data processing. Readers will explore the intersection of business and data science as they walk through practical questions like executing standard queries, establishing automated data management processes and policies, and developing useful reports. Most importantly, they'll learn techniques to extract valuable insights from data while mastering the features of Pig.
M. Tim Jones is a product architect and engineering author specializing in virtualization, Linux, Linux internals, programming, network protocols, embedded development, and artificial intelligence. Over the past decade, he has written five books, two of which have been released in second editions. His publications focus on networking protocols, Sockets programming, artificial intelligence, and Linux user-space programming. His work includes book descriptions, errata, and various online publications that explore these technical domains.