Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Rate it:
Open Preview
46%
Flag icon
both serializability and linearizability, and this combination is known as strict serializability or strong one-copy serializability
46%
Flag icon
which linearizability is an important requirement for making a system work correctly.
46%
Flag icon
One way of electing a leader is to use a lock: every node that starts up tries to acquire the lock, and the one that succeeds becomes the leader
46%
Flag icon
They use consensus algorithms to implement linearizable operations in a fault-tolerant way
46%
Flag icon
Uniqueness constraints are common in databases: for example, a username or email address must uniquely identify one user, and
46%
Flag icon
These constraints all require there to be a single up-to-date value (the account balance, the stock level, the seat occupancy) that all nodes agree on.
46%
Flag icon
the simplest answer would be to really only use a single copy of the data. However, that approach would not be able to tolerate faults:
46%
Flag icon
The most common approach to making a system fault-tolerant is to use replication.
46%
Flag icon
nodes. If you make reads from the leader, or from synchronously updated followers, they have the potential to be linearizable.iv However, not every single-leader database is actually linearizable, either
46%
Flag icon
Using the leader for reads relies on the assumption that you know for sure who the leader is. As discussed in “The Truth Is Defined by the Majority”, it is quite possible for a node to think that it is the leader, when in fact it is not
46%
Flag icon
requests, it is likely to violate linearizability [20].
46%
Flag icon
With asynchronous replication, failover may even lose committed writes (see “Handling Node Outages”), which violates bo...
This highlight has been truncated due to consecutive passage length restrictions.
46%
Flag icon
Systems with multi-leader replication are generally not linearizable, because they concurrently process writes on multiple nodes and asynchronously replicate them to other nodes.
46%
Flag icon
Intuitively, it seems as though strict quorum reads and writes should be linearizable in a Dynamo-style model.
46%
Flag icon
when we have variable network delays, it is possible to have race conditions, as demonstrated
46%
Flag icon
client B reads from a different quorum of two nodes, and gets back the old value 0 from both.
46%
Flag icon
The quorum condition is met (w + r > n), but this execution is nevertheless not linearizable:
46%
Flag icon
summary, it is safest to assume that a leaderless system with Dynamo-style replication does not provide linearizability.
47%
Flag icon
Thus, applications that don’t require linearizability can be more tolerant of network problems.
47%
Flag icon
CAP theorem
47%
Flag icon
Although linearizability is a useful guarantee, surprisingly few systems are actually linearizable in practice. For example, even RAM on a modern multi-core CPU is not linearizable
47%
Flag icon
The reason for this behavior is that every CPU core has its own memory cache and store buffer.
47%
Flag icon
However, there are now several copies of the data (one in main memory, and perhaps several more in various caches), and these copies are asynchronously updated, so linearizability is lost.
47%
Flag icon
The reason for dropping linearizability is performance, not fault tolerance.
47%
Flag icon
that every operation appears to take effect atomically at one point in time. This definition implies that operations are executed in some well-defined order. We
47%
Flag icon
we saw that the main purpose of the leader in single-leader replication is to determine the order of writes in the replication
47%
Flag icon
If there is no single leader, conflicts can occur due to concurrent operations
47%
Flag icon
Serializability, which we discussed in Chapter 7, is about ensuring that transactions behave as if they were ex...
This highlight has been truncated due to consecutive passage length restrictions.
47%
Flag icon
achieved by literally executing transactions in that serial order, or by allowing concurrent execution while pr...
This highlight has been truncated due to consecutive passage length restrictions.
47%
Flag icon
timestamps and clocks in distrib...
This highlight has been truncated due to consecutive passage length restrictions.
47%
Flag icon
is another attempt to introduce order into a disorderly world,
47%
Flag icon
why ordering keeps coming up, and one of the reasons is that it helps preserve causality.
47%
Flag icon
conversation
47%
Flag icon
that there is a causal dependency between the question and the answer.
47%
Flag icon
This happened before relationship is another expression of causality:
47%
Flag icon
a consistent snapshot. But what does “consistent” mean in this context? It means consistent with causality: if the snapshot contains an answer, it must also contain the question being answered
47%
Flag icon
Serializable snapshot isolation (see “Serializable Snapshot Isolation (SSI)”) detects write skew by tracking the causal dependencies between transactions.
47%
Flag icon
Causality imposes an ordering on events: cause comes before effect;
47%
Flag icon
If a system obeys the ordering imposed by causality, we say that it is causally consistent.
47%
Flag icon
other. We say they are incomparable, and therefore mathematical sets are partially ordered:
47%
Flag icon
two events are ordered if they are causally related (one happened before the other), but they are incomparable if they are concurrent. This means that causality defines a partial order, not a total order:
47%
Flag icon
there are no concurrent operations in a linearizable datastore:
47%
Flag icon
Concurrency would mean that the timeline branches and merges again
47%
Flag icon
any system that is linearizable will preserve causality correctly [7].
47%
Flag icon
making a system linearizable can harm its performance and availability,
48%
Flag icon
In many cases, systems that appear to require linearizability in fact only really require causal consistency,
48%
Flag icon
Capturing causal dependencies
48%
Flag icon
you need to know which operation happened before which other operation.
48%
Flag icon
In order to determine causal dependencies, we need some way of describing the “knowledge” of a node in the system.
48%
Flag icon
If a node had already seen the value X when it issued the write Y, then X and Y may be causally related.
1 16 28