The discipline of Big Data Analytics (BDA) is fast gaining a lot of market and mind shares as the realization technologies, techniques and tools innately enabling BDA are stabilizing and maturing in an unprecedented fashion with the overwhelming support from different stakeholders including worldwide product and platform vendors, analytics researchers, open source community members, IT service organizations, cloud service providers (CSPs), etc. HBase is one of the open-source NoSQL database technologies facilitating the simplification and streamlining of the originally complicated BDA. In this book, the authors have brought in a number of pragmatic design patterns and best practices in order to precisely leverage the HBase technology in implementing enterprise-scale, modular and scalable big data applications. The beauty is that the design patterns tightly associated with HBase could be easily used for other NoSQL databases.
The initial chapters cover what HBase is and how it can be installed in a single or multiple computers. Then there is an easy-to-use example of Java code to read and write data in HBase. The book covers the simplest HBase tables to deal with single entities, such as the table of users. Design patterns here emphasize on scalability, performance, and planning for special cases such as restoring forgotten passwords. It covers how to store large files in HBase systems, talks about the alternative ways of storing them and the best practices extracted from solutions for large environments, such as Facebook, Amazon, and Twitter. The book illustrates how stock market, human health monitoring, and system monitoring data are all classified as time series data. The design patterns for this organize time-based measurements in groups, resulting in balanced, high-performing HBase tables. A chapter is specially allocated to discuss one of the most common design patterns for NoSQL de-normalization, where the data is duplicated in more than one table, resulting in huge performance benefits. It shows you how to implement a many-to-many relationship in HBase that deals with transactions using compound keys. The final chapter covers the bulk loading for the initial data load into HBase, profiling HBase applications, benchmarking, and load testing.
This book is a must for Hadoop application developers. The authors, based on their vast experiences and educations, have clearly articulated the principal patterns in order to lessen the workload on software developers. The key differentiator is that the book is stuffed and sandwiched with a lot of examples and useful tips to enable learners to quickly as well as formally understand the nitty-gritty of design patterns in swiftly and sagaciously building and sustaining next-generation HBase applications.
Very shallow, many repetitions & not necessary information. Not so much on real data modeling, use of Phoenix not always beneficial to understanding of data organized inside Hadoop. (although it not directly related to HBase, but the Cassandra Data Modeling and Analysis explains this much better - it shows not only CQL, but also how the data is actually stored in Cassandra).
Too much reliance on project Phoenix, doesn't discuss the tradeoffs of using Uuids as 'primary keys'. As such, the design patterns are lopsided for write operations, not scan operations, but no mention is made for that, either.