Goodreads helps you keep track of books you want to read.
Start by marking “Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems” as Want to Read:
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Enlarge cover
Rate this book
Clear rating
Open Preview

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

4.72  ·  Rating details ·  1,496 ratings  ·  193 reviews
Want to know how the best software engineers and architects structure their applications to make them scalable, reliable, and maintainable in the long term? This book examines the key principles, algorithms, and trade-offs of data systems, using the internals of various popular software packages and frameworks as examples.

Tools at your disposal are evolving and demands on
Paperback, 624 pages
Published April 2nd 2017 by O'Reilly Media (first published April 25th 2015)
More Details... edit details

Friend Reviews

To see what your friends thought of this book, please sign up.

Reader Q&A

To ask other readers questions about Designing Data-Intensive Applications, please sign up.

Be the first to ask a question about Designing Data-Intensive Applications

Community Reviews

Showing 1-30
4.72  · 
Rating details
 ·  1,496 ratings  ·  193 reviews

Sort order
Emre Sevinç
Nov 10, 2016 rated it it was amazing  ·  review of another edition
I consider this book a mini-encyclopedia of modern data engineering. Like a specialized encyclopedia, it covers a broad field in considerable detail. But it is not a practice or a cookbook for a particular Big Data, NoSQL or newSQL product. What the author does is to lay down the principles of current distributed big data systems, and he does a very fine job of it.

If you are after the obscure details of a particular product, or some tutorials and "how-to"s, go elsewhere. But if you want to unde
Yevgeniy Brikman
A must-read for every programmer. This is the best overview of data storage and distributed systems—two key concepts for building almost any piece of software today—that I've seen anywhere. Martin does a wonderful job of taking a massive body of research and distilling complicated concepts and difficult trade-offs down to a level where anyone can understand it.

I learned a lot about replication, partitioning, linearizability, locking, write skew, phantoms, transactions, event logs, and more. I'm
(5.0) excellent summary/foundation/recommendations for distributed systems development, covers a lot of the use cases for data-intensive (vs compute-intensive) apps/services. I recommend to anyone doing service development.

Recommendations are well-reasoned, citations are helpful and are leading me to do a lot more reading.

Thank you for finding and sharing this one, @Chet. I think this will be a book we assign as a primer for working at Goodreads going forward. At least some of the (later) chapte
David Bjelland
Nov 22, 2017 rated it it was amazing  ·  review of another edition
Shelves: cs-software
Like you'd expect of a technical book with such a broad scope, there are sections that most readers in the target audience will probably find either too foundational or too esoteric to justify writing about at this kind of length, but still - at its best, I shudder to think of the time wasted groping in the dark for an ad hoc understanding of concepts it explains holistically in just a few unfussy, lucid pages and a diagram or two.

Definitely a book I see myself reaching for as a reference or me
Sebastian Gebski
Honestly, this one took me much more time than I've expected.
Plus, it's definitely one of the best technical books I've read in years - but still, it doesn't mean you should run straight away to your bookshop - read up to the end of the review first.

I'll risk the statement that this book's content will not be 100% directly applicable to your work, BUT it will make you a better engineer in general. It's like with reading books about Haskell - most likely you'll never use this language for any pra
Nov 09, 2017 rated it it was ok  ·  review of another edition
Some quite valuable content diluted with less useful content. I think I’d much prefer to read this author’s focused articles or blogs than recommend that someone slog through this.

I’m still not quite sure who the intended audience of this book is, but it’s definitely not me. The intro chapter discusses the example of Twitter’s fan-out writes and how they balanced typical users with celebrities who have millions of followers. Because of that intro, I expected a series of architecture patterns and
Sep 01, 2018 rated it it was amazing  ·  review of another edition
I recently used Spark to count all the data stores mentioned throughout the book.

There's a total of 72 products, where Apache ZooKeeper, PostgreSQL and MySQL are the ones most mentioned, with 46, 44 and 42 citations.

The complete list is available at
Ye Lin Kyaw
Oct 09, 2018 rated it it was amazing  ·  review of another edition
There should be a 6-star rating for this book.
Juan Ignacio
Sep 08, 2018 rated it it was amazing  ·  review of another edition
Shelves: favorites
My full notes:

IMHO this book is a modern classic, a must read for every software engineer and developer. I’m certain that it will be reread it from time.
Apr 11, 2019 rated it liked it  ·  review of another edition
Just did a nuanced review on my tech blog, here:
Ahmad hosseini
This book changed my view to designing application!
What is the meaning of Data-Intensive?
We call an application data-intensive if data is its primary challenge- the quality of data, the complexity of data, or the speed at which it is changing.

Who should read this book?
I think that all developers must read this book. If you develop applications that have some kind of server/ backend for storing or processing data, and your application use the internet, then this book is for you.

Why should you, as
Stepan Kuzmin
Nov 05, 2018 rated it really liked it  ·  review of another edition
Хорошая, годная книга, но перевод местами подкачал (порадовали картографы и редукторы в главе про MapReduce). Интересная заключительная часть про будущее информационных систем — понравилась концепция «алгоритмической тюрьмы». В целом — полезное чтиво.
Bodo Tasche
May 03, 2018 rated it it was amazing  ·  review of another edition
Shelves: technology
This book is an amazing must have for every backend developer. Highly recommended.
Oleksii Zuiev
Nov 05, 2018 rated it it was amazing  ·  review of another edition
The book gives comprehensive overview of design aspects for systems working with data. For each of them it goes deep enough to describe needed concepts and principles and implementation options. And if you want to go deeper, after each chapter there are big lists of references to relevant research papers, specific implementations etc. The book ends with a chapter where the author gives his subjective view on where the industry is moving. Which is distinct to the rest of the book but still is an ...more
I wish I had this book 5 years go. A complete text on distributed systems that are extremely valuable for hands on experience. You have to read this book multiple times to get a good grasp on concepts on distributed computing. I do feel the title is little misleading for a solid texts on distributed systems. Highly recommended.
Emanuele Blanco
Jan 10, 2018 rated it it was amazing  ·  review of another edition
A clear and detailed overview of the challenges modern applications have to face while dealing with data and the current state-of-the-art. From SSTables to event sourcing, Martin Kleppman gives great insights on what every engineer/architect should know when designing systems that deal with any kind of data. Highly recommended.
Andrzej Hołowko
Great book. Every software developer should definitely read it. It covers many topics, hard to remember everything, but it gives you a notion of systems/databases/tools/techniques used nowadays. You should be aware of trade-offs in every solution, before you use it and this book is a good start point.
Bhashit Parikh
Encyclopedic and fun. Not only is this book packed with info about modern database systems, and related systems, it's very engaging and provides enough references to keep you busy for a long time. My bookmarks list is surely going to take a long time to work through. The book is a geek treasure if you are not intimately familiar with the various topics covered.
Tuấn Anh Nguyễn
This book covers a lot of interesting topics in distributed data systems with great clarity. It also promotes streaming/event sourcing/change capturing approach to building systems, which I think is the right direction. Then it gets a good chapter about the potential perils of big data (bias ML, surveillance). Furthermore, it contains a lot of references for further research. A 5/5 would be underrating it.
Sergey Shishkin
Feb 04, 2017 rated it it was amazing  ·  review of another edition
Comprehensive overview of modern data systems like data storage, caches, search indices, messaging systems. Martin does a great job of maintaining a neutral point of view throughout the book, providing historical context and showing, where every piece of the puzzle fits into the big picture of application architecture. As a bonus, each chapter comes with ~100 references for deeper study of the topic.

Key takeaways for me were:
- Demystification of database storage and index data structures on dis
Daniel Aguilar
Pretty good review of a few different database engines, languages, infrastructure configurations and operations involved in data-intensive projects. Gets quite a lot into details about the algorithms and structures behind popular database systems and how/why they are suited for different purposes.

The book is an early release and still lacks many chapters, but it provides some good info anyway. I will probably get back to it when the final release is published.
Waldemar Neto
Mar 29, 2017 rated it it was amazing  ·  review of another edition
This book don't talk only about design but it goes deeply into each component of actual high available systems like databases, queues, data serialization. Another very good point is that it has a lot of references for each chapter.
Artem Sotenko
Oct 16, 2016 rated it it was amazing  ·  review of another edition
6 of 5 even in unfinished version
Sameer Rahmani
Sep 22, 2017 rated it it was amazing
It's a really great book. The author is well known in the field and the author of Apache Samza. In this book he explains even smallest challenges in creating a distributed data intensive system.
Benji Visser
Dec 17, 2017 rated it it was amazing  ·  review of another edition
A very comprehensive and clear explanation of modern (and some legacy) data systems.

The should be the entry point for anyone looking to learn about data engineering.
Trinh Quoc Anh
Apr 12, 2019 rated it it was amazing  ·  review of another edition
A brilliant book. It's both deep and broad, but still easy to understand. Well, easy enough.

The best point of this book is that the authors introduce and explain very clearly the concept behind a myriad Big Data technologies (around 50+ are mentioned): their strengths, weaknesses, constraints, trade-offs, applications,...However, we are not flooded in the technical details, just the core idea, the abstraction is mentioned. This understanding is crucial if we have to choose between several techno
David Castillo
Jan 11, 2018 rated it it was amazing  ·  review of another edition
Shelves: tech
An essential book for modern software engineers. It takes you from the very basic (and often multiple) definitions of words like availability and consistency to in-depth analyses of different approaches to modern systems design.
It's a very dense book, and I mean that as a good thing. It means it doesn't waste space. Every paragraph is filled with applicable knowledge and insight, and the references are plenty if you want to dive into a certain topic.
The author's tone throughout the book is very
An excellent guide and reference to all the main considerations and problems arising when creating distributed software systems.
The book takes a look at modern practices in the field, supported by references to all the major literature for when you want to investigate more on a specific subject.
Kleppmann explores the topics in a compelling style, without going excessively deep or specific on any single one, yet without shunning complete explanations when needed.
I also enjoyed the rich analysis o
Filippo Pacifici
A very extensive and systematic discussion of a wide and often misunderstood topic. I would suggest any software architect and software engineer to read it.
I think the greatest value provided by this book is to connect the theoretical topics of distributed systems with existing products. This is something many distributed systems books fail at. They do a great job in explaining the algorithms but they do not bridge the gap to the existing frameworks.
This book does this very well, thus making it
Lara Thompson
Jul 16, 2018 rated it it was amazing  ·  review of another edition
Shelves: technical
Absolutely incredible introduction to technology-agnostic data storage and transfer in the high paced global world. A little pedantic but entertaining, easy to read cover to cover and yet will serve as a future reference for a long time to come. Very nice discussion of the moral murkiness of data collection akin to the subject of Jaron Lanier's books in the final chapter. Highly recommend.
« previous 1 3 4 5 6 7 8 9 next »
topics  posts  views  last activity   
Futurice: Recommendation: Designing Data-Intensive Applications 1 40 Jul 06, 2017 04:41AM  
  • Elasticsearch: The Definitive Guide: A Distributed Real-Time Search and Analytics Engine
  • Distributed Systems For Fun and Profit
  • Big Data: Principles and best practices of scalable realtime data systems
  • Seven Concurrency Models in Seven Weeks: When Threads Unravel
  • Functional Programming in Scala
  • Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale
  • I Heart Logs: Event Data, Stream Processing, and Data Integration
  • A Philosophy of Software Design
  • REST in Practice: Hypermedia and Systems Architecture
  • Software Design X-Rays: Fix Technical Debt with Behavioral Code Analysis
  • Release It!: Design and Deploy Production-Ready Software (Pragmatic Programmers)
  • Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement
  • Learn you some Erlang for great good!
  • Building Microservices: Designing Fine-Grained Systems
  • Concepts, Techniques, and Models of Computer Programming
  • High Performance Browser Networking
  • Functional and Reactive Domain Modeling
  • Building Evolutionary Architectures: Support Constant Change

Goodreads is hiring!

If you like books and love to build cool products, we may be looking for you.
Learn more »
“In distributed systems, suspicion, pessimism, and paranoia pay off.” 5 likes
“data outlives code.” 3 likes
More quotes…