Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Rate it:
Open Preview
Kindle Notes & Highlights
22%
Flag icon
Being able to reboot individual nodes without downtime is a big advantage for operations and maintenance. Thus, our goal is to keep the system as a whole running despite individual node failures, and to keep the impact of a node outage as small as possible. How do you achieve high availability with leader-based replication?
22%
Flag icon
Handling a failure of the leader is trickier: one of the followers needs to be promoted to be the new leader, clients need to be reconfigured to send their writes to the new leader, and the other followers need to start consuming data changes from the new leader. This process is called failover.
22%
Flag icon
Getting all the nodes to agree on a new leader is a consensus problem, discussed in detail in Chapter 9.
22%
Flag icon
The most common solution is for the old leader’s unreplicated writes to simply be discarded, which may violate clients’ durability expectations.
22%
Flag icon
These primary keys were also used in a Redis store, so the reuse of primary keys resulted in inconsistency between MySQL and Redis, which caused some private data to be disclosed to the wrong users.
22%
Flag icon
A logical log format is also easier for external applications to parse. This aspect is useful if you want to send the contents of a database to an external system, such as a data warehouse for offline analysis, or for building custom indexes and caches [18]. This technique is called change data capture, and we will return to it in Chapter 11.
22%
Flag icon
A trigger lets you register custom application code that is automatically executed when a data change (write transaction) occurs in a database system. The trigger has the opportunity to log this change into a separate table, from which it can be read by an external process.
22%
Flag icon
This inconsistency is just a temporary state — if you stop writing to the database and wait a while, the followers will eventually catch up and become consistent with the leader. For that reason, this effect is known as eventual consistency
23%
Flag icon
When reading something that the user may have modified, read it from the leader; otherwise, read it from a follower.
23%
Flag icon
Our second example of an anomaly that can occur when reading from asynchronous followers is that it’s possible for a user to see things moving backward in time.
23%
Flag icon
If some partitions are replicated slower than others, an observer may see the answer before they see the question.
23%
Flag icon
It would be better if application developers didn’t have to worry about subtle replication issues and could just trust their databases to “do the right thing.” This is why transactions exist: they are a way for a database to provide stronger guarantees so that the application can be simpler.
23%
Flag icon
For example, autoincrementing keys, triggers, and integrity constraints can be problematic. For this reason, multi-leader replication is often considered dangerous territory that should be avoided if possible
24%
Flag icon
However, unlike a leader database, that coordinator does not enforce a particular ordering of writes. As we shall see, this difference in design has profound consequences for the way the database is used.
25%
Flag icon
However, even with w + r > n, there are likely to be edge cases where stale values are returned. These depend on the implementation, but possible scenarios include:
26%
Flag icon
When the server receives a write with a particular version number, it can overwrite all values with that version number or below (since it knows that they have been merged into the new value), but it must keep all values with a higher version number (because those values are concurrent with the incoming write).
26%
Flag icon
However, if you want to allow people to also remove things from their carts, and not just add things, then taking the union of siblings may not yield the right result: if you merge two sibling carts and an item has been removed in only one of them, then the removed item will reappear in the union of the siblings [37]. To prevent this problem, an item cannot simply be deleted from the database when it is removed; instead, the system must leave a marker with an appropriate version number to indicate that the item has been removed when merging siblings. Such a deletion marker is known as a ...more
26%
Flag icon
Despite being a simple goal — keeping a copy of the same data on several machines — replication turns out to be a remarkably tricky problem.
28%
Flag icon
If the partitioning is unfair, so that some partitions have more data or queries than others, we call it skewed. The presence of skew makes partitioning much less effective.
28%
Flag icon
A partition with disproportionately high load is called a hot spot.
28%
Flag icon
The ranges of keys are not necessarily evenly spaced, because your data may not be evenly distributed.
28%
Flag icon
Consistent hashing, as defined by Karger et al. [7], is a way of evenly distributing load across an internet-wide system of caches such as a content delivery network (CDN).
28%
Flag icon
As we shall see in “Rebalancing Partitions”, this particular approach actually doesn’t work very well for databases [8], so it is rarely used in practice
28%
Flag icon
Perhaps in the future, data systems will be able to automatically detect and compensate for skewed workloads; but for now, you need to think through the trade-offs for your own application.
28%
Flag icon
The problem with secondary indexes is that they don’t map neatly to partitions. There are two main approaches to partitioning a database with secondary indexes: document-based partitioning and term-based partitioning.
28%
Flag icon
For that reason, a document-partitioned index is also known as a local index (as opposed to a global index, described in the next section).
29%
Flag icon
A global index must also be partitioned, but it can be partitioned differently from the primary key index.
29%
Flag icon
red cars from all partitions appear under color:red in the index, but the index is partitioned so that colors starting with the letters a to r appear in partition 0 and colors starting with s to z appear in partition 1. The index on the make of car is partitioned similarly (with the partition boundary being between f and h).
29%
Flag icon
While rebalancing is happening, the database should continue accepting reads and writes.
29%
Flag icon
The problem with the mod N approach is that if the number of nodes N changes, most of the keys will need to be moved from one node to another. For example, say hash(key) = 123456. If you initially have 10 nodes, that key starts out on node 6 (because 123456 mod 10 = 6). When you grow to 11 nodes, the key needs to move to node 3 (123456 mod 11 = 3), and when you grow to 12 nodes, it needs to move to node 0 (123456 mod 12 = 0). Such frequent moves make rebalancing excessively expensive. We need an approach that doesn’t move data around more than necessary.
29%
Flag icon
Fixed number of partitions Fortunately, there is a fairly simple solution: create many more partitions than there are nodes, and assign several partitions to each node.
30%
Flag icon
This requires choosing a partitioning scheme that is appropriate to your data, and rebalancing the partitions when nodes are added to or removed from the cluster.
30%
Flag icon
Some authors have claimed that general two-phase commit is too expensive to support, because of the performance or availability problems that it brings. We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions. James Corbett et al., Spanner: Google’s Globally-Distributed Database (2012)
30%
Flag icon
If you have spent years working with transactions, they may seem obvious, but we shouldn’t take them for granted. Transactions are not a law of nature; they were created with a purpose, namely to simplify the programming model for applications accessing a database.
31%
Flag icon
The safety guarantees provided by transactions are often described by the well-known acronym ACID, which stands for Atomicity, Consistency, Isolation, and Durability.
31%
Flag icon
The classic database textbooks formalize isolation as serializability, which means that each transaction can pretend that it is the only transaction running on the entire database.
32%
Flag icon
Those issues would be incredibly confusing, so storage engines almost universally aim to provide atomicity and isolation on the level of a single object (such as a key-value pair) on one node.
32%
Flag icon
Some databases also provide more complex atomic operations,iv such as an increment operation, which removes the need for a read-modify-write cycle like that in Figure 7-1.
32%
Flag icon
A transaction is usually understood as a mechanism for grouping multiple operations on multiple objects into one unit of execution.
32%
Flag icon
These indexes are different database objects from a transaction point of view: for example, without transaction isolation, it’s possible for a record to appear in one index but not another, because the update to the second index hasn’t happened yet.
32%
Flag icon
Errors will inevitably happen, but many software developers prefer to think only about the happy path rather than the intricacies of error handling.
32%
Flag icon
This is a shame, because the whole point of aborts is to enable safe retries.
32%
Flag icon
If the error is due to overload, retrying the transaction will make the problem worse, not better. To avoid such feedback cycles, you can limit the number of retries, use exponential backoff, and handle overload-related errors differently from other errors (if possible).
32%
Flag icon
If you want to make sure that several different systems either commit or abort together, two-phase commit can help (we will discuss this in “Atomic Commit and Two-Phase Commit (2PC)”).
32%
Flag icon
Such timing issues might occur very rarely, and are usually difficult to reproduce. Concurrency is also very difficult to reason about, especially in a large application where you don’t necessarily know which other pieces of code are accessing the database.
32%
Flag icon
Application development is difficult enough if you just have one user at a time; having many concurrent users makes it much harder still, because any piece of data could unexpectedly change at any time.
32%
Flag icon
Concurrency bugs caused by weak transaction isolation are not just a theoretical problem. They have caused substantial loss of money [24, 25], led to investigation by financial auditors [26], and caused customer data to be corrupted [27].
32%
Flag icon
Rather than blindly relying on tools, we need to develop a good understanding of the kinds of concurrency problems that exist, and how to prevent them.
33%
Flag icon
Because it maintains several versions of an object side by side, this technique is known as multi-version concurrency control (MVCC).
33%
Flag icon
Unfortunately, the SQL standard’s definition of isolation levels is flawed — it is ambiguous, imprecise, and not as implementation-independent as a standard should be