More on this book
Community
Kindle Notes & Highlights
Read between
August 2 - December 28, 2020
Causal consistency goes further: it needs to track causal dependencies across the entire database, not just for a single key. Version vectors can be generalized to do this
actually keeping track of all causal dependencies can become impracticable.
it is not clear whether the write is causally dependent on all or only some of those prior reads.
a large ov...
This highlight has been truncated due to consecutive passage length restrictions.
sequence numbers
If there is not a single leader
how to generate sequence numbers for operations. Various
own independent set of sequence numbers.
one node can generate only odd numbers and the other ...
This highlight has been truncated due to consecutive passage length restrictions.
time-of-day clock
You can preallocate blocks of sequence numbers.
The causality problems occur because these sequence number generators do not correctly capture the ordering of operations across different nodes:
there is actually a simple method for generating sequence numbers that is consistent with causality.
Lamport timestamp,
simply a pair of (counter, node ID).
if the counter values are the same, the one with the greater node ID is the greater timestamp.
every node and every client keeps track of the maximum counter value it has seen so far,
When a node receives a request or response with a maximum counter value greater than its own counter value, it immediately increases its own counter to that maximum.
they have a different purpose: version vectors can distinguish whether two operations are concurrent or whether one is causally dependent on the other,
Lamport timestamps always enforce a total ordering.
dependent. The advantage of Lamport timestamps over version vectors is that they are more compact.
in order to implement something like a uniqueness constraint for usernames, it’s not sufficient to have a total ordering of operations
This idea of knowing when your total order is finalized is captured in the topic of total order broadcast.
total order broadcast or atomic broadcast
partitions. Total ordering across all partitions is possible, but requires additional coordination
Total order broadcast is usually described as a protocol for exchanging messages between nodes.
Reliable delivery
Totally ordered delivery
This fact is a hint that there is a strong connection between total order broadcast and consensus,
Total order broadcast is exactly what you need for database replication:
total order broadcast can be used to implement serializable transactions:
Another way of looking at total order broadcast is that it is a way of creating a log
Total order broadcast is also useful for implementing a lock service that provides fencing tokens
linearizable system there is a total order of operations.
Total order broadcast is asynchronous: messages are guaranteed to be delivered reliably in a fixed order, but there is no guarantee about when a message will be delivered
others). By contrast, linearizability is a recency guarantee: a read is guaranteed to see the latest value written.
Imagine that for every possible username, you can have a linearizable register with an atomic compare-and-set operation. Every register initially has the value null
If multiple users try to concurrently grab the same username, only one of the compare-and-set operations will succeed,
such a linearizable compare-and-set operation as follows by using total order broadcast as an append-only log
Read the log, and wait for the message you appended to be delivered back to you.xi
Choosing the first of the conflicting writes as the winner and aborting later ones ensures that all nodes agree on whether a write was committed or aborted.
it doesn’t guarantee linearizable reads —
You can sequence reads through the log by appending a message, reading the log, and performing the actual read when the message is delivered back to you.
simple: for every message you want to send through total order broadcast, you increment-and-get the linearizable integer, and then attach the value you got from the register as a sequence number to the message.
gaps. Thus, if a node has delivered message 4 and receives an incoming message with a sequence number of 6, it knows that it must wait for message 5 before it can deliver message 6.
in fact, this is the key difference between total order broadcast and timestamp ordering.
The problem lies in handling the situation when network connections to that node are interrupted, and restoring the value when that node fails
you inevitably end up with a consensus algorithm.
coincidence: it can be proved that a linearizable compare-and-set (or increment-and-get) register and total order broadcast are both equivalent to consensus
is, if you can solve one of these problems, you can transform it into a solution for the others.