Get up to speed on Apache Accumulo, the flexible, high-performance key/value store created by the National Security Agency (NSA) and based on Google’s BigTable data storage system. Written by former NSA team members, this comprehensive tutorial and reference covers Accumulo architecture, application development, table design, and cell-level security.
With clear information on system administration, performance tuning, and best practices, this book is ideal for administrators charged with installing and maintaining Accumulo, programmers seeking to write Accumulo applications, and other professionals interested in what Accumulo has to offer. You will find everything you need to use this system fully.
- Get a high-level introduction on what Accumulo has to offer - Take a rapid tour through single- and multiple-node installations, data ingest, and query - Learn how to write Accumulo applications for several use cases, based on examples - Dive into Accumulo internals, including information not available in the documentation - Get detailed information for installing, administering, tuning, and measuring performance - Learn best practices based on successful implementations in the field - Find answers to common questions that every new Accumulo user asks
This book has undergone a careful vetting process with the U.S. government to ensure that classified or proprietary information has not been revealed.
Aaron Cordova worked as a computer systems researcher at the US National Security Agency, where he started and led the Apache Accumulo project through its first release.
He has built large-scale data processing and analysis systems for intelligence, defense, academic research, and web companies.
It seems that most of the reviews (what few there are) are for the early release version. I only briefly looked over that version, but it seems to have come a long way since then.
This book is far from perfect, but as someone relatively new to Accumulo, I'm happy to have it. It covers a wide range of topics from installation/administration to table design.
The free documentation that comes with Accumulo is pretty good, and always getting better, but it's not really sufficient for someone who is just learning. In the cloud world it seems like everything is slated for HBase. It's hard to say if Accumulo will ever see the same kind of traction as HBase, but this book makes a good case as why it might.
The authors are people who know what they are talking about. Two of them (Mr. Cordova and Ms. Rinaldi) who have had a large hand it Accumulo's development. Meanwhile Mr. Wall is someone who brings perspective as a developer of applications using Accumulo.
This is probably the first textbook I've ever read cover to cover. It's pretty well organized to be done that way, though I suspect most people will simply look up topics of interest to them. I learned a lot from reading this book and will continue to refer back to it from time to time while using Accumulo at work.
This book offers a lot of breadth, at the cost of some depth. The further I get into Accumulo development, the more I'd like to learn about Table and Iterator design. Accumulo Iterators seem complicated enough that they could warrant a book by themselves. They cover some of the more popular table designs, and built in iterators but don't talk too much about custom design.
That will get you pretty far, but it doesn't really help when trying to customize applications to meet your customers needs. Things like query optimizations and how tune your design for performance are only lightly touched on.
Overall, I think this is a very good book on Accumulo that is a great compliment to the documentation for both new and experienced Accumulo developers. If you're working with Accumulo, this book is a must own.
It is rare to find a computer book that contains just the right level of information to provide a good technical introduction to a complex system in one volume, but this one does it for Accumulo. And it is well written, too. It describes the system's architecture, interfaces, programming model in enough detail so you can understand what it does and how it does it and why you'd use it, but avoids huge dumps of ephemeral information, like massive amounts details on how to install the system. The code samples are large enough to be clear, but small enough to be useful and not boring.