Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Rate it:
Open Preview
Kindle Notes & Highlights
2%
Flag icon
configuration errors by operators were the leading cause of outages, whereas hardware faults (servers or network) played a role in only 10–25% of outages
Thiago Ghisi liked this
2%
Flag icon
We therefore need to think of response time not as a single number, but as a distribution of values that you can measure.
Thiago Ghisi liked this
2%
Flag icon
a 100 ms increase in response time reduces sales by 1%
Thiago Ghisi liked this
2%
Flag icon
a 1-second slowdown reduces a customer satisfaction metric by 16%
2%
Flag icon
Reducing response times at very high percentiles is difficult because they are easily affected by random events outside of your control, and the benefits are diminishing.
3%
Flag icon
Queueing delays often account for a large part of the response time at high percentiles.
3%
Flag icon
Even if you make the calls in parallel, the end-user request still needs to wait for the slowest of the parallel calls to complete.
Thiago Ghisi liked this
3%
Flag icon
An architecture that is appropriate for one level of load is unlikely to cope with 10 times that load.
3%
Flag icon
using several fairly powerful machines can still be simpler and cheaper than a large number of small virtual machines.
3%
Flag icon
there is no such thing as a generic, one-size-fits-all scalable architecture
Thiago Ghisi liked this
3%
Flag icon
An architecture that scales well for a particular application is built around assumptions of which operations will be common and which will be rare—the load parameters. If those assumptions turn out to be wrong, the engineering effort for scaling is at best wasted, and at worst counterproductive.
3%
Flag icon
In an early-stage startup or an unproven product it’s usually more important to be able to iterate quickly on product features than it is to scale to some hypothetical future load.
3%
Flag icon
It is well known that the majority of the cost of software is not in its initial development, but in its ongoing maintenance—fixing
11%
Flag icon
This is an important trade-off in storage systems: well-chosen indexes speed up read queries, but every index slows down writes.
11%
Flag icon
Concurrency and crash recovery are much simpler if segment files are append-only or immutable. For example, you don’t have to worry about the case where a crash happened while a value was being overwritten, leaving you with a file containing part of the old and part of the new value spliced together.
Thiago Ghisi liked this
13%
Flag icon
Counterintuitively, the performance advantage of in-memory databases is not due to the fact that they don’t need to read from disk. Even a disk-based storage engine may never need to read from disk if you have enough memory, because the operating system caches recently used disk blocks in memory anyway. Rather, they can be faster because they can avoid the overheads of encoding in-memory data structures in a form that can be written to disk
Thiago Ghisi liked this
14%
Flag icon
Data warehouses now exist in almost all large enterprises, but in small companies they are almost unheard of.
17%
Flag icon
as long as people agree on what the format is, it often doesn’t matter how pretty or efficient the format is. The difficulty of getting different organizations to agree on anything outweighs most other concerns.
19%
Flag icon
A key design goal of a service-oriented/microservices architecture is to make the application easier to change and maintain by making services independently deployable and evolvable. For example, each service should be owned by one team, and that team should be able to release new versions of the service frequently, without having to coordinate with other teams.
19%
Flag icon
Although RPC seems convenient at first, the approach is fundamentally flawed
19%
Flag icon
Part of the appeal of REST is that it doesn’t try to hide the fact that it’s a network protocol (although this doesn’t seem to stop people from building RPC libraries on top of REST).
Thiago Ghisi liked this
21%
Flag icon
For a successful technology, reality must take precedence over public relations, for nature cannot be fooled. Richard Feynman, Rogers Commission Report (1986)
21%
Flag icon
The problem with a shared-memory approach is that the cost grows faster than linearly: a machine with twice as many CPUs, twice as much RAM, and twice as much disk capacity as another typically costs significantly more than twice as much. And due to bottlenecks, a machine twice the size cannot necessarily handle twice the load.
21%
Flag icon
In some cases, a simple single-threaded program can perform significantly better than a cluster with over 100 CPU cores
21%
Flag icon
The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair. Douglas Adams, Mostly Harmless (1992)
Thiago Ghisi liked this
28%
Flag icon
Clearly, we must break away from the sequential and not limit the computers. We must state definitions and provide for priorities and descriptions of data. We must state relationships, not procedures. Grace Murray Hopper, Management and the Computer of the Future (1962)
35%
Flag icon
A system designed for single-threaded execution can sometimes perform better than a system that supports concurrency, because it can avoid the coordination overhead of locking.
Thiago Ghisi liked this
38%
Flag icon
Working with distributed systems is fundamentally different from writing software on a single computer—and the main difference is that there are lots of new and exciting ways for things to go wrong
Thiago Ghisi liked this
38%
Flag icon
In the end, our task as engineers is to build systems that do their job (i.e., meet the guarantees that users are expecting), in spite of everything going wrong.
38%
Flag icon
a supercomputer is more like a single-node computer than a distributed system: it deals with partial failure by letting it escalate into total failure—if any part of the system fails, just let everything crash (like a kernel panic on a single machine).
Thiago Ghisi liked this
39%
Flag icon
In distributed systems, suspicion, pessimism, and paranoia pay off.
39%
Flag icon
Shared-nothing is not the only way of building systems, but it has become the dominant approach for building internet services, for several reasons: it’s comparatively cheap because it requires no special hardware, it can make use of commoditized cloud computing services, and it can achieve high reliability through redundancy across multiple geographically distributed datacenters.
43%
Flag icon
If you can avoid opening Pandora’s box and simply keep things on a single machine, it is generally worth doing so.
47%
Flag icon
All in all, there is a lot of misunderstanding and confusion around CAP, and it does not help us understand systems better, so CAP is best avoided.
47%
Flag icon
although CAP has been historically influential, it has little practical value for designing systems
48%
Flag icon
In many cases, systems that appear to require linearizability in fact only really require causal consistency, which can be implemented more efficiently.
51%
Flag icon
Distributed transactions thus have a tendency of amplifying failures, which runs counter to our goal of building fault-tolerant systems.
Thiago Ghisi liked this
52%
Flag icon
frequent leader elections result in terrible performance because the system can end up spending more time choosing a leader than doing any useful work.
55%
Flag icon
In reality, integrating disparate systems is one of the most important things that needs to be done in a nontrivial application.
55%
Flag icon
A system cannot be successful if it is too strongly influenced by a single person. Once the initial design is complete and fairly robust, the real test begins as people with many different viewpoints undertake their own experiments. Donald Knuth
56%
Flag icon
lack of integration leads to Balkanization of data.
59%
Flag icon
The MapReduce approach is more appropriate for larger jobs: jobs that process so much data and run for such a long time that they are likely to experience at least one task failure along the way.
Thiago Ghisi liked this
59%
Flag icon
In an environment where tasks are not so often terminated, the design decisions of MapReduce make less sense.
62%
Flag icon
A complex system that works is invariably found to have evolved from a simple system that works. The inverse proposition also appears to be true: A complex system designed from scratch never works and cannot be made to work. John Gall, Systemantics (1975)
68%
Flag icon
This principle is known as exactly-once semantics, although effectively-once would be a more descriptive term
70%
Flag icon
Surprisingly often I see software engineers make statements like, “In my experience, 99% of people only need X” or “…don’t need X” (for various values of X). I think that such statements say more about the experience of the speaker than about the actual usefulness of a technology.
76%
Flag icon
violations of timeliness are “eventual consistency,” whereas violations of integrity are “perpetual inconsistency.”
76%
Flag icon
in most applications, integrity is much more important than timeliness.
77%
Flag icon
If we cannot fully trust that every individual component of the system will be free from corruption—that every piece of hardware is fault-free and that every piece of software is bug-free—then we must at least periodically check the integrity of our data.
Thiago Ghisi liked this
77%
Flag icon
Predictive analytics systems merely extrapolate from the past; if the past is discriminatory, they codify that discrimination.
« Prev 1