Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Rate it:
Open Preview
48%
Flag icon
Causal consistency goes further: it needs to track causal dependencies across the entire database, not just for a single key. Version vectors can be generalized to do this
48%
Flag icon
actually keeping track of all causal dependencies can become impracticable.
48%
Flag icon
it is not clear whether the write is causally dependent on all or only some of those prior reads.
48%
Flag icon
a large ov...
This highlight has been truncated due to consecutive passage length restrictions.
48%
Flag icon
sequence numbers
48%
Flag icon
If there is not a single leader
48%
Flag icon
how to generate sequence numbers for operations. Various
48%
Flag icon
own independent set of sequence numbers.
48%
Flag icon
one node can generate only odd numbers and the other ...
This highlight has been truncated due to consecutive passage length restrictions.
48%
Flag icon
time-of-day clock
48%
Flag icon
You can preallocate blocks of sequence numbers.
48%
Flag icon
The causality problems occur because these sequence number generators do not correctly capture the ordering of operations across different nodes:
48%
Flag icon
there is actually a simple method for generating sequence numbers that is consistent with causality.
48%
Flag icon
Lamport timestamp,
48%
Flag icon
simply a pair of (counter, node ID).
48%
Flag icon
if the counter values are the same, the one with the greater node ID is the greater timestamp.
48%
Flag icon
every node and every client keeps track of the maximum counter value it has seen so far,
48%
Flag icon
When a node receives a request or response with a maximum counter value greater than its own counter value, it immediately increases its own counter to that maximum.
48%
Flag icon
they have a different purpose: version vectors can distinguish whether two operations are concurrent or whether one is causally dependent on the other,
48%
Flag icon
Lamport timestamps always enforce a total ordering.
48%
Flag icon
dependent. The advantage of Lamport timestamps over version vectors is that they are more compact.
48%
Flag icon
in order to implement something like a uniqueness constraint for usernames, it’s not sufficient to have a total ordering of operations
48%
Flag icon
This idea of knowing when your total order is finalized is captured in the topic of total order broadcast.
48%
Flag icon
total order broadcast or atomic broadcast
48%
Flag icon
partitions. Total ordering across all partitions is possible, but requires additional coordination
48%
Flag icon
Total order broadcast is usually described as a protocol for exchanging messages between nodes.
48%
Flag icon
Reliable delivery
48%
Flag icon
Totally ordered delivery
48%
Flag icon
This fact is a hint that there is a strong connection between total order broadcast and consensus,
48%
Flag icon
Total order broadcast is exactly what you need for database replication:
48%
Flag icon
total order broadcast can be used to implement serializable transactions:
48%
Flag icon
Another way of looking at total order broadcast is that it is a way of creating a log
48%
Flag icon
Total order broadcast is also useful for implementing a lock service that provides fencing tokens
49%
Flag icon
linearizable system there is a total order of operations.
49%
Flag icon
Total order broadcast is asynchronous: messages are guaranteed to be delivered reliably in a fixed order, but there is no guarantee about when a message will be delivered
49%
Flag icon
others). By contrast, linearizability is a recency guarantee: a read is guaranteed to see the latest value written.
49%
Flag icon
Imagine that for every possible username, you can have a linearizable register with an atomic compare-and-set operation. Every register initially has the value null
49%
Flag icon
If multiple users try to concurrently grab the same username, only one of the compare-and-set operations will succeed,
49%
Flag icon
such a linearizable compare-and-set operation as follows by using total order broadcast as an append-only log
49%
Flag icon
Read the log, and wait for the message you appended to be delivered back to you.xi
49%
Flag icon
Choosing the first of the conflicting writes as the winner and aborting later ones ensures that all nodes agree on whether a write was committed or aborted.
49%
Flag icon
it doesn’t guarantee linearizable reads —
49%
Flag icon
You can sequence reads through the log by appending a message, reading the log, and performing the actual read when the message is delivered back to you.
49%
Flag icon
simple: for every message you want to send through total order broadcast, you increment-and-get the linearizable integer, and then attach the value you got from the register as a sequence number to the message.
49%
Flag icon
gaps. Thus, if a node has delivered message 4 and receives an incoming message with a sequence number of 6, it knows that it must wait for message 5 before it can deliver message 6.
49%
Flag icon
in fact, this is the key difference between total order broadcast and timestamp ordering.
49%
Flag icon
The problem lies in handling the situation when network connections to that node are interrupted, and restoring the value when that node fails
49%
Flag icon
you inevitably end up with a consensus algorithm.
49%
Flag icon
coincidence: it can be proved that a linearizable compare-and-set (or increment-and-get) register and total order broadcast are both equivalent to consensus
49%
Flag icon
is, if you can solve one of these problems, you can transform it into a solution for the others.
1 17 28