More on this book
Community
Kindle Notes & Highlights
Read between
August 2 - December 28, 2020
both serializability and linearizability, and this combination is known as strict serializability or strong one-copy serializability
which linearizability is an important requirement for making a system work correctly.
One way of electing a leader is to use a lock: every node that starts up tries to acquire the lock, and the one that succeeds becomes the leader
They use consensus algorithms to implement linearizable operations in a fault-tolerant way
Uniqueness constraints are common in databases: for example, a username or email address must uniquely identify one user, and
These constraints all require there to be a single up-to-date value (the account balance, the stock level, the seat occupancy) that all nodes agree on.
the simplest answer would be to really only use a single copy of the data. However, that approach would not be able to tolerate faults:
The most common approach to making a system fault-tolerant is to use replication.
nodes. If you make reads from the leader, or from synchronously updated followers, they have the potential to be linearizable.iv However, not every single-leader database is actually linearizable, either
Using the leader for reads relies on the assumption that you know for sure who the leader is. As discussed in “The Truth Is Defined by the Majority”, it is quite possible for a node to think that it is the leader, when in fact it is not
requests, it is likely to violate linearizability [20].
With asynchronous replication, failover may even lose committed writes (see “Handling Node Outages”), which violates bo...
This highlight has been truncated due to consecutive passage length restrictions.
Systems with multi-leader replication are generally not linearizable, because they concurrently process writes on multiple nodes and asynchronously replicate them to other nodes.
Intuitively, it seems as though strict quorum reads and writes should be linearizable in a Dynamo-style model.
when we have variable network delays, it is possible to have race conditions, as demonstrated
client B reads from a different quorum of two nodes, and gets back the old value 0 from both.
The quorum condition is met (w + r > n), but this execution is nevertheless not linearizable:
summary, it is safest to assume that a leaderless system with Dynamo-style replication does not provide linearizability.
Thus, applications that don’t require linearizability can be more tolerant of network problems.
CAP theorem
Although linearizability is a useful guarantee, surprisingly few systems are actually linearizable in practice. For example, even RAM on a modern multi-core CPU is not linearizable
The reason for this behavior is that every CPU core has its own memory cache and store buffer.
However, there are now several copies of the data (one in main memory, and perhaps several more in various caches), and these copies are asynchronously updated, so linearizability is lost.
The reason for dropping linearizability is performance, not fault tolerance.
that every operation appears to take effect atomically at one point in time. This definition implies that operations are executed in some well-defined order. We
we saw that the main purpose of the leader in single-leader replication is to determine the order of writes in the replication
If there is no single leader, conflicts can occur due to concurrent operations
Serializability, which we discussed in Chapter 7, is about ensuring that transactions behave as if they were ex...
This highlight has been truncated due to consecutive passage length restrictions.
achieved by literally executing transactions in that serial order, or by allowing concurrent execution while pr...
This highlight has been truncated due to consecutive passage length restrictions.
timestamps and clocks in distrib...
This highlight has been truncated due to consecutive passage length restrictions.
is another attempt to introduce order into a disorderly world,
why ordering keeps coming up, and one of the reasons is that it helps preserve causality.
conversation
that there is a causal dependency between the question and the answer.
This happened before relationship is another expression of causality:
a consistent snapshot. But what does “consistent” mean in this context? It means consistent with causality: if the snapshot contains an answer, it must also contain the question being answered
Serializable snapshot isolation (see “Serializable Snapshot Isolation (SSI)”) detects write skew by tracking the causal dependencies between transactions.
Causality imposes an ordering on events: cause comes before effect;
If a system obeys the ordering imposed by causality, we say that it is causally consistent.
other. We say they are incomparable, and therefore mathematical sets are partially ordered:
two events are ordered if they are causally related (one happened before the other), but they are incomparable if they are concurrent. This means that causality defines a partial order, not a total order:
there are no concurrent operations in a linearizable datastore:
Concurrency would mean that the timeline branches and merges again
any system that is linearizable will preserve causality correctly [7].
making a system linearizable can harm its performance and availability,
In many cases, systems that appear to require linearizability in fact only really require causal consistency,
Capturing causal dependencies
you need to know which operation happened before which other operation.
In order to determine causal dependencies, we need some way of describing the “knowledge” of a node in the system.
If a node had already seen the value X when it issued the write Y, then X and Y may be causally related.