Goodreads helps you keep track of books you want to read.
Start by marking “Designing Data-Intensive Applications” as Want to Read:
Designing Data-Intensive Applications
Enlarge cover
Rate this book
Clear rating
Open Preview

Designing Data-Intensive Applications

4.71  ·  Rating details ·  775 Ratings  ·  108 Reviews
ebook, Early Release - Raw & Unedited, 562 pages
Published by O'Reilly (first published April 25th 2015)
More Details... edit details

Friend Reviews

To see what your friends thought of this book, please sign up.

Reader Q&A

To ask other readers questions about Designing Data-Intensive Applications, please sign up.

Be the first to ask a question about Designing Data-Intensive Applications

Community Reviews

(showing 1-30)
Rating details
Sort: Default
Emre Sevinç
Nov 10, 2016 rated it it was amazing
I consider this book a mini-encyclopedia of modern data engineering. Like a specialized encyclopedia, it covers a broad field in considerable detail. But it is not a practice or a cookbook for a particular Big Data, NoSQL or newSQL product. What the author does is to lay down the principles of current distributed big data systems, and he does a very fine job of it.

If you are after the obscure details of a particular product, or some tutorials and "how-to"s, go elsewhere. But if you want to unde
Yevgeniy Brikman
Jul 22, 2017 rated it it was amazing
A must-read for every programmer. This is the best overview of data storage and distributed systems—two key concepts for building almost any piece of software today—that I've seen anywhere. Martin does a wonderful job of taking a massive body of research and distilling complicated concepts and difficult trade-offs down to a level where anyone can understand it.

I learned a lot about replication, partitioning, linearizability, locking, write skew, phantoms, transactions, event logs, and more. I'm
(5.0) excellent summary/foundation/recommendations for distributed systems development, covers a lot of the use cases for data-intensive (vs compute-intensive) apps/services. I recommend to anyone doing service development.

Recommendations are well-reasoned, citations are helpful and are leading me to do a lot more reading.

Thank you for finding and sharing this one, @Chet. I think this will be a book we assign as a primer for working at Goodreads going forward. At least some of the (later) chapte
David Bjelland
Nov 22, 2017 rated it it was amazing  ·  review of another edition
Shelves: cs-software
Like you'd expect of a technical book with such a broad scope, there are sections that most readers in the target audience will probably find either too foundational or too esoteric to justify writing about at this kind of length, but still - at its best, I shudder to think of the time wasted groping in the dark for an ad hoc understanding of concepts it explains holistically in just a few unfussy, lucid pages and a diagram or two.

Definitely a book I see myself reaching for as a reference or me
Nov 09, 2017 rated it it was ok  ·  review of another edition
Some quite valuable content diluted with less useful content. I think I’d much prefer to read this author’s focused articles or blogs than recommend that someone slog through this.

I’m still not quite sure who the intended audience of this book is, but it’s definitely not me. The intro chapter discusses the example of Twitter’s fan-out writes and how they balanced typical users with celebrities who have millions of followers. Because of that intro, I expected a series of architecture patterns and
Emanuele Blanco
Jan 10, 2018 rated it it was amazing
A clear and detailed overview of the challenges modern applications have to face while dealing with data and the current state-of-the-art. From SSTables to event sourcing, Martin Kleppman gives great insights on what every engineer/architect should know when designing systems that deal with any kind of data. Highly recommended.
Andrzej Hołowko
Sep 25, 2017 rated it it was amazing
Shelves: owned, favorites, big-data
Great book. Every software developer should definitely read it. It covers many topics, hard to remember everything, but it gives you a notion of systems/databases/tools/techniques used nowadays. You should be aware of trade-offs in every solution, before you use it and this book is a good start point.
Bodo Tasche
May 03, 2018 rated it it was amazing
Shelves: technology
This book is an amazing must have for every backend developer. Highly recommended.
Tuấn Anh Nguyễn
This book covers a lot of interesting topics in distributed data systems with great clarity. It also promotes streaming/event sourcing/change capturing approach to building systems, which I think is the right direction. Then it gets a good chapter about the potential perils of big data (bias ML, surveillance). Furthermore, it contains a lot of references for further research. A 5/5 would be underrating it.
Ahmad hosseini
Jun 29, 2017 rated it it was amazing
This book changed my view to designing application!
What is the meaning of Data-Intensive?
We call an application data-intensive if data is its primary challenge- the quality of data, the complexity of data, or the speed at which it is changing.

Who should read this book?
I think that all developers must read this book. If you develop applications that have some kind of server/ backend for storing or processing data, and your application use the internet, then this book is for you.

Why should you, as
Daniel Aguilar
Pretty good review of a few different database engines, languages, infrastructure configurations and operations involved in data-intensive projects. Gets quite a lot into details about the algorithms and structures behind popular database systems and how/why they are suited for different purposes.

The book is an early release and still lacks many chapters, but it provides some good info anyway. I will probably get back to it when the final release is published.
Artem Sotenko
Oct 16, 2016 rated it it was amazing
6 of 5 even in unfinished version
Sameer Rahmani
It's a really great book. The author is well known in the field and the author of Apache Samza. In this book he explains even smallest challenges in creating a distributed data intensive system.
David Castillo
Jan 11, 2018 rated it it was amazing
Shelves: tech
An essential book for modern software engineers. It takes you from the very basic (and often multiple) definitions of words like availability and consistency to in-depth analyses of different approaches to modern systems design.
It's a very dense book, and I mean that as a good thing. It means it doesn't waste space. Every paragraph is filled with applicable knowledge and insight, and the references are plenty if you want to dive into a certain topic.
The author's tone throughout the book is very
Denis Romanovsky
This book is the best one on databases I have ever read. It moves you through the basic principles to complex distributed systems and then generalizes common approaches with strong reasons. In the end of the book you feel like you have a great wide and deep picture along the whole data technology landscape and know how to act in any particular situation. I just loved this book!
May 01, 2018 rated it liked it
Shelves: backburner
Cursory book on data systems, wished it focused more on principles of designing data systems
Robert Sanek
Jan 21, 2018 rated it it was amazing
Shelves: favorites
Quite possibly the best technical book I've ever read. If you are a data infrastructure engineer, you've got to read this thing.
Waldemar Neto
Mar 29, 2017 rated it it was amazing
This book don't talk only about design but it goes deeply into each component of actual high available systems like databases, queues, data serialization. Another very good point is that it has a lot of references for each chapter.
Chase DuBois
Nov 10, 2017 rated it really liked it  ·  review of another edition
An excellent high-level overview of the foundational concepts of modern, complex data systems. I say "high-level", but this actually goes into some detail about how databases and replication algorithms are implemented, often employing straightforward flow diagrams to illustrate race conditions and other concepts. Kleppmann's language is usually plain and accessible and his examples are mostly at the right level of abstraction.

I found this most helpful for understanding the differences and simila
Jan 01, 2018 rated it it was amazing
Very good introduction to database and distributed systems and good list of advanced references for deeper learning. My opinion is that everybody who work in the area of distributed systems must understand concepts listed in this book.
Nov 22, 2017 rated it it was amazing  ·  review of another edition
This is an eye-opener.
Sergey Shishkin
Feb 04, 2017 rated it it was amazing
Comprehensive overview of modern data systems like data storage, caches, search indices, messaging systems. Martin does a great job of maintaining a neutral point of view throughout the book, providing historical context and showing, where every piece of the puzzle fits into the big picture of application architecture. As a bonus, each chapter comes with ~100 references for deeper study of the topic.

Key takeaways for me were:
- Demystification of database storage and index data structures on dis
Oct 30, 2017 rated it it was amazing
This book is incredibly useful if you spend time engineering software systems that process data (AKA all systems). In my career at Shopify I've spent time as a Data Engineer working on Hadoop/Spark batch processing systems, on the 'Merchant Analytics' team working on a stream processing and real-time analytics database, and as an application developer working on standard Rails apps. Having this book available to me would've been incredibly useful, and it contains many many lessons I've learned o ...more
Ronald Rajagukguk
Jul 13, 2017 rated it it was amazing
Think of this book as a gate on how the world of modern database works. It briefly covers most of the technology used in big website and also contains lots of link in case the reader want to go deeper on a specific subject.
Will Fleming
Apr 29, 2016 rated it really liked it
Excellent so far. It's been in early release for quite a while and remains unfinished, unfortunately.
Sebastian Velez
Sep 06, 2017 rated it it was amazing
One of the best books I've read about software development. Is not only about data, is about distributed systems and understanding the complexities of software architecture on many levels.
May 06, 2015 rated it really liked it
Can't wait to read missing chapters ;)
Aug 09, 2015 rated it it was amazing
Shelves: leutrónicos
A perfect read before 'Seven databases in seven weeks'.
Aug 08, 2017 rated it really liked it
In brief, a must read for anyone building a system touching data in one way or the other.

That holds true even despite a few misgivings. The book already started to date, indeed, I recall scanning most of the early access editions since early 2015. As new papers are constantly being written and I wouldn’t expect author to track these. However, a few technologies are gaining wider adoption i.e. distributed caches are transforming into in-memory data fabrics e.g. Apache Ignite, Hazelcast. ETL got r
Julio Biason
Jan 19, 2018 rated it liked it  ·  review of another edition
Shelves: it, kindle
First off, right out of the bat: If you want to design Data Intensive Applications, this is not the book you're looking for. This book goes greats lengths to explain how already existing Data Intensive Applications work -- say, how Zookeeper works when synching data, how Cassandra works without a leader, how PostgreSQL do transactions and so on.

While informative, the biggest problem is that most of the text is very loaded: there are layers and layers on each paragraph and you'll take a long time
« previous 1 3 4 5 6 7 8 9 next »
topics  posts  views  last activity   
Futurice: Recommendation: Designing Data-Intensive Applications 1 27 Jul 06, 2017 04:41AM  
  • Seven Concurrency Models in Seven Weeks: When Threads Unravel
  • Elasticsearch: The Definitive Guide: A Distributed Real-Time Search and Analytics Engine
  • Building Microservices: Designing Fine-Grained Systems
  • REST in Practice: Hypermedia and Systems Architecture
  • MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems
  • Site Reliability Engineering: How Google Runs Production Systems
  • Implementing Domain-Driven Design
  • Functional Programming Patterns in Scala and Clojure: Write Lean Programs for the JVM
  • Release It!: Design and Deploy Production-Ready Software (Pragmatic Programmers)
  • JavaScript Allongé: A strong cup of functions, objects, combinators, and decorators
  • Functional JavaScript: Introducing Functional Programming with Underscore.js
  • The Art of Multiprocessor Programming
  • Functional Programming in Scala
  • Spring in Action
  • Programming in Scala
  • Big Data: Principles and best practices of scalable realtime data systems
  • Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites
  • Good Math: A Geek's Guide to the Beauty of Numbers, Logic, and Computation

Goodreads is hiring!

If you like books and love to build cool products, we may be looking for you.
Learn more »
“In distributed systems, suspicion, pessimism, and paranoia pay off.” 3 likes
“data outlives code.” 2 likes
More quotes…