Goodreads helps you keep track of books you want to read.
Start by marking “Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems” as Want to Read:
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Enlarge cover
Rate this book
Clear rating
Open Preview

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

4.72  ·  Rating details ·  4,500 ratings  ·  472 reviews
Want to know how the best software engineers and architects structure their applications to make them scalable, reliable, and maintainable in the long term? This book examines the key principles, algorithms, and trade-offs of data systems, using the internals of various popular software packages and frameworks as examples.

Tools at your disposal are evolving and demands on
Paperback, 616 pages
Published April 2nd 2017 by O'Reilly Media (first published April 25th 2015)
More Details... Edit Details

Friend Reviews

To see what your friends thought of this book, please sign up.

Reader Q&A

To ask other readers questions about Designing Data-Intensive Applications, please sign up.

Be the first to ask a question about Designing Data-Intensive Applications

Community Reviews

Showing 1-30
Average rating 4.72  · 
Rating details
 ·  4,500 ratings  ·  472 reviews

More filters
Sort order
Start your review of Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Emre Sevinç
Nov 10, 2016 rated it it was amazing  ·  review of another edition
I consider this book a mini-encyclopedia of modern data engineering. Like a specialized encyclopedia, it covers a broad field in considerable detail. But it is not a practice or a cookbook for a particular Big Data, NoSQL or newSQL product. What the author does is to lay down the principles of current distributed big data systems, and he does a very fine job of it.

If you are after the obscure details of a particular product, or some tutorials and "how-to"s, go elsewhere. But if you want to unde
Yevgeniy Brikman
A must-read for every programmer. This is the best overview of data storage and distributed systems—two key concepts for building almost any piece of software today—that I've seen anywhere. Martin does a wonderful job of taking a massive body of research and distilling complicated concepts and difficult trade-offs down to a level where anyone can understand it.

I learned a lot about replication, partitioning, linearizability, locking, write skew, phantoms, transactions, event logs, and more. I'm
Sebastian Gebski
Honestly, this one took me much more time than I've expected.
Plus, it's definitely one of the best technical books I've read in years - but still, it doesn't mean you should run straight away to your bookshop - read up to the end of the review first.

I'll risk the statement that this book's content will not be 100% directly applicable to your work, BUT it will make you a better engineer in general. It's like with reading books about Haskell - most likely you'll never use this language for any pra
Nov 09, 2017 rated it it was ok  ·  review of another edition
Some quite valuable content diluted with less useful content. I think I’d much prefer to read this author’s focused articles or blogs than recommend that someone slog through this.

I’m still not quite sure who the intended audience of this book is, but it’s definitely not me. The intro chapter discusses the example of Twitter’s fan-out writes and how they balanced typical users with celebrities who have millions of followers. Because of that intro, I expected a series of architecture patterns and
(5.0) excellent summary/foundation/recommendations for distributed systems development, covers a lot of the use cases for data-intensive (vs compute-intensive) apps/services. I recommend to anyone doing service development.

Recommendations are well-reasoned, citations are helpful and are leading me to do a lot more reading.

Thank you for finding and sharing this one, @Chet. I think this will be a book we assign as a primer for working at Goodreads going forward. At least some of the (later) chapte
David Bjelland
Nov 22, 2017 rated it it was amazing  ·  review of another edition
Shelves: cs-software
Like you'd expect of a technical book with such a broad scope, there are sections that most readers in the target audience will probably find either too foundational or too esoteric to justify writing about at this kind of length, but still - at its best, I shudder to think of the time wasted groping in the dark for an ad hoc understanding of concepts it explains holistically in just a few unfussy, lucid pages and a diagram or two.

Definitely a book I see myself reaching for as a reference or me
Szymon Kulec
Dec 11, 2019 rated it really liked it  ·  review of another edition
The perception of this depends on how much do you know already.

If you know a lot about serialization: JSON, Avro, Google Protocol Buffers, MessagePack, you name it; db data structures: WAL, B+Tree, LSM, you name it; distributed systems: consensus (Paxos, Raft), messaging (at-least-once, at-most, idempotence), partitioning, you won't gain a lot.

If you read tens of whitepapers, read _internals_ books, you won't gain a lot.

If you run Jepsen tests on your own product, you won't gain a lot.

But if yo
Sep 01, 2018 rated it it was amazing  ·  review of another edition
I recently used Spark to count all the data stores mentioned throughout the book.

There's a total of 72 products, where Apache ZooKeeper, PostgreSQL and MySQL are the ones most mentioned, with 46, 44 and 42 citations.

The complete list is available at
Mark Seemann
Feb 09, 2020 rated it liked it
Shelves: software
At the beginning of reading this, I vacillated between a three-star and a four-star rating. The book is organised into three parts. The first part is about data storage on a single machine. Whenever it would cover material I already knew, I'd be mildly bored. Whenever it covered material that was unfamiliar to me, I found the explanations lucid and fascinating. I venture that I would have been as pleased with the topics I already knew about, had I not already known about them.

In part II, the boo
Mohamed Elsherif
Fantastic book, it took me almost 9 months to finish, but I am glad that I did, I think this book is a very important read to anyone building any application/system that use data in any way, shape or form.
Highly recommended.
Ye Lin Kyaw
Oct 09, 2018 rated it it was amazing  ·  review of another edition
There should be a 6-star rating for this book.
Ahmad hosseini
This book changed my view to designing application!
What is the meaning of Data-Intensive?
We call an application data-intensive if data is its primary challenge- the quality of data, the complexity of data, or the speed at which it is changing.

Who should read this book?
I think that all developers must read this book. If you develop applications that have some kind of server/ backend for storing or processing data, and your application use the internet, then this book is for you.

Why should you, as
Sep 08, 2018 rated it it was amazing  ·  review of another edition
Shelves: favorites
My full notes:

IMHO this book is a modern classic, a must read for every software engineer and developer. I’m certain that it will be reread it from time.
Sameer Rahmani
Sep 22, 2017 rated it it was amazing
It's a really great book. The author is well known in the field and the author of Apache Samza. In this book he explains even smallest challenges in creating a distributed data intensive system. ...more
Apr 11, 2019 rated it liked it  ·  review of another edition
Just did a nuanced review on my tech blog, here: ...more
Apr 14, 2020 rated it it was amazing  ·  review of another edition
Must read for anyone who wants to work in the distributed systems space.
Dec 05, 2020 rated it it was amazing  ·  review of another edition
You almost had me till the very end, Martin Kleppmann, but I will not let that ruin my experience in reading this little book of yours.

Going in, I thought I would be reading something like the classic System Design prep Github repos with a lot of information told very quickly. You should know that this is purely about the data part: Kleppmann goes in depth on databases, message brokers, batch processing from the perspective of how the pieces of data are affected. There is less on pure infrastruc
Ieva Gr
Aug 02, 2019 rated it really liked it  ·  review of another edition
Shelves: technical
Was it easy to read: It may have been the hardest thing I’ve ever read. The writing style is actually nice and quite colloquial for a technical book. But there is so much information in it! It took me ages (half a year) to get through it by taking notes when reading.

What I liked about it: The amount of information and the wast contexts that it covers: important concepts likes response time percentiles, linearisability, serializability and etc. explained; deep dive into database theory, different
Apr 02, 2020 rated it it was amazing  ·  review of another edition
Shelves: read-tech
This book is monumental. It explains many aspects of designing data applications in a very approachable way. It has everything; from high level differences between SQL and NoSQL to low level details of how databases work. The explanations are clear and accompanied by code samples, diagrams and examples of data engines that work that way.

Part I of the book covers the fundamentals (e.g. how to handle data on a single machine). Part II covers Distributed data: how to handle it and issues you'll fa
Piotr Kafel
Jan 27, 2021 rated it it was amazing  ·  review of another edition
Book every software engineer should read! I do not have a single complaint so be prepared to read a review full of praise...

The book is split into 3 parts. Each one of them is incredibly packed with information.

1. Foundations of System Data
Here Mr Kleppmann describes basics of how databases, indices and different encodings work. This chapter is essential to understand the next parts of the book. Even though it might sound like an appetizer you can already find here plenty of meat about schema (a
Edwin Dalorzo
In this category, this is, perhaps, one of the best books that exist on the subject; however there’s nothing on this book about how to specifically design my own data-intensive applications. This is more an overview of different distributed database design ideas and the challenges of designing proper distributed database systems and applications. As an overview of those topics, this book is awesome, but it failed to delivered what the title proposes. I really felt enriched by broadening my under ...more
Naing Lin
Jun 14, 2019 rated it it was amazing  ·  review of another edition
Often, We learn our skills by acquaintance and usually miss to cultivate the underlying knowledge of particular subject. Therefore we compensate it by another type of learning; by experience. But it's not always accurate and mostly based on hindsight which is why we need another layer knowledge to justify ours. These foundation of knowledge help us to giving better landscape to see the problem correspondingly. Hence, it could eventually make better decision (or trade off) for creators.

Jun 02, 2019 rated it it was amazing  ·  review of another edition
Probably the best written technical book I ever read. Martin Kleppman is vastly knowledgeable about all types and classes of databases and principles of data processing, but also uncannily talented in teaching others with clarity and a pinch of subtle humour. He covers the entire map of the territory that are data processing principles and systems with great detail (and delightfully toys with the map metaphor at the beginning of each new chapter), yet never gets bogged down. The book finishes of ...more
Henrik Warne
Jul 06, 2019 rated it it was amazing  ·  review of another edition
A fantastic book that should be mandatory reading for all software developers. It covers databases and distributed system in a detailed and accessible way. Very clear writing, good diagrams and illustrations, and no fluff.
I have written a summary of all the chapters on my blog:
Bodo Tasche
May 03, 2018 rated it it was amazing  ·  review of another edition
Shelves: technology
This book is an amazing must have for every backend developer. Highly recommended.
Dmitriy Rozhkov
That's a very good book every Senior Software Engineer should read. ...more
Sometimes I think it's funny how hard it is for me to properly review a great book - even a technical one, where one would think reviewing comes easier. So, let's whip out the big guns right off the bat - this is probably the best technical book that I've ever read and it's very likely I'll be returning to it for a refresher from time to time.

But what makes it so great, an inquisitive mind might ask? There wasn't a lot of practical value in it, at least not of the kind that would be instantly ap
Oleksii Zuiev
Nov 05, 2018 rated it it was amazing  ·  review of another edition
The book gives comprehensive overview of design aspects for systems working with data. For each of them it goes deep enough to describe needed concepts and principles and implementation options. And if you want to go deeper, after each chapter there are big lists of references to relevant research papers, specific implementations etc. The book ends with a chapter where the author gives his subjective view on where the industry is moving. Which is distinct to the rest of the book but still is an ...more
Emanuele Blanco
Jan 10, 2018 rated it it was amazing  ·  review of another edition
A clear and detailed overview of the challenges modern applications have to face while dealing with data and the current state-of-the-art. From SSTables to event sourcing, Martin Kleppman gives great insights on what every engineer/architect should know when designing systems that deal with any kind of data. Highly recommended.
I wish I had this book 5 years go. A complete text on distributed systems that are extremely valuable for hands on experience. You have to read this book multiple times to get a good grasp on concepts on distributed computing. I do feel the title is little misleading for a solid texts on distributed systems. Highly recommended.
« previous 1 3 4 5 6 7 8 9 next »

Readers also enjoyed

  • Building Microservices: Designing Fine-Grained Systems
  • Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale
  • Fundamentals of Software Architecture: An Engineering Approach
  • Effective Java
  • Clean Code: A Handbook of Agile Software Craftsmanship
  • Clean Architecture
  • Java Concurrency in Practice
  • Release It!: Design and Deploy Production-Ready Software (Pragmatic Programmers)
  • System Design Interview – An Insider's Guide
  • Software Engineering at Google: Lessons Learned from Programming Over Time
  • Design Patterns: Elements of Reusable Object-Oriented Software
  • Site Reliability Engineering: How Google Runs Production Systems
  • Head First Design Patterns
  • Monolith to Microservices: Sustaining Productivity While Detangling the System
  • The Pragmatic Programmer: From Journeyman to Master
  • Refactoring: Improving the Design of Existing Code
  • Domain-Driven Design: Tackling Complexity in the Heart of Software
  • Designing Distributed Systems: Patterns and Paradigms for Scalable, Reliable Services
See similar books…

Goodreads is hiring!

If you like books and love to build cool products, we may be looking for you.
Learn more »

News & Interviews

Need another excuse to treat yourself to a new book this week? We've got you covered with the buzziest new releases of the day. To create our...
40 likes · 6 comments
“In distributed systems, suspicion, pessimism, and paranoia pay off.” 9 likes
“The moral of the story is that a NoSQL system may find itself accidentally reinventing SQL, albeit in disguise.” 8 likes
More quotes…