Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Rate it:
Open Preview
46%
Flag icon
The whole point of a consistent snapshot is that it does not include writes that are more recent than the snapshot, and thus reads from the snapshot are not linearizable.
46%
Flag icon
Coordination services like Apache ZooKeeper [15] and etcd [16] are often used to implement distributed locks and leader election.
46%
Flag icon
The web server doesn’t place the entire photo on the queue, since most message brokers are designed for small messages, and a photo may be several megabytes in size.
46%
Flag icon
The most common approach to making a system fault-tolerant is to use replication.
46%
Flag icon
Using the leader for reads relies on the assumption that you know for sure who the leader is.
47%
Flag icon
If the network between datacenters is interrupted in a single-leader setup, clients connected to follower datacenters cannot contact the leader, so they cannot make any writes to the database, nor any linearizable reads.
47%
Flag icon
If the application requires linearizable reads and writes, the network interruption causes the application to become unavailable in the datacenters that cannot contact the leader.
47%
Flag icon
CAP was originally proposed as a rule of thumb, without precise definitions, with the goal of starting a discussion about trade-offs in databases.
47%
Flag icon
At times when the network is working correctly, a system can provide both consistency (linearizability) and total availability.
47%
Flag icon
When a network fault occurs, you have to choose between either linearizability or total availability.
47%
Flag icon
All in all, there is a lot of misunderstanding and confusion around CAP, and it does not help us understand systems better, so CAP is best avoided.
47%
Flag icon
Although linearizability is a useful guarantee, surprisingly few systems are actually linearizable in practice.
47%
Flag icon
Memory access first goes to the cache by default, and any changes are asynchronously written out to main memory.
47%
Flag icon
The reason for dropping linearizability is performance, not fault tolerance.
47%
Flag icon
Linearizability is slow—and this is true all the time, not only during a network fault.
48%
Flag icon
If a system obeys the ordering imposed by causality, we say that it is causally consistent.
48%
Flag icon
A total order allows any two elements to be compared, so if you have two elements, you can always say which one is greater and which one is smaller.
48%
Flag icon
There might be several requests waiting to be handled, but the datastore ensures that every request is handled atomically at a single point in time, acting on a single copy of the data, along a single timeline, without any concurrency.
48%
Flag icon
If you are familiar with distributed version control systems such as Git, their version histories are very much like the graph of causal dependencies.
48%
Flag icon
Often one commit happens after another, in a straight line, but sometimes you get branches (when several people concurrently work on a project), and merges are created when those concurrently created commits are combined.
48%
Flag icon
The fact that linearizability ensures causality is what makes linearizable systems simple to understand and appealing.
48%
Flag icon
In many cases, systems that appear to require linearizability in fact only really require causal consistency, which can be implemented more efficiently.
48%
Flag icon
In order to maintain causality, you need to know which operation happened before which other operation. This is a partial order: concurrent operations may be processed in any order, but if one operation happened before another, then they must be processed in that order on every replica.
48%
Flag icon
Thus, when a replica processes an operation, it must ensure that all causally preceding operations (all operations that happened before) have already been processed; if some preceding operation is missing, the later operation must wait until the preceding operation has been processed.
48%
Flag icon
If a node had already seen the value X when it issued the write Y, then X and Y may be causally related.
48%
Flag icon
In order to determine the causal ordering, the database needs to know which version of the data was read by the application.
48%
Flag icon
Although causality is an important theoretical concept, actually keeping track of all causal dependencies can become impracticable.
48%
Flag icon
In many applications, clients read lots of data before writing something, and then it is not clear whether the write is causally dependent on all or only some of those prior reads.
48%
Flag icon
Explicitly tracking all the data that has been read would mea...
This highlight has been truncated due to consecutive passage length restrictions.
48%
Flag icon
The leader can simply increment a counter for each operation, and thus assign a monotonically increasing sequence number to each operation in the replication log.
48%
Flag icon
If a follower applies the writes in the order they appear in the replication log, the state of the follower is always causally consistent (even if it is lagging behind the leader).
48%
Flag icon
If there is not a single leader (perhaps because you are using a multi-leader or leaderless database, or because the database is partitioned), it is less clear how to generate sequence numbers for operations.
48%
Flag icon
Each node may process a different number of operations per second.
48%
Flag icon
Thus, if one node generates even numbers and the other generates odd numbers, the counter for even numbers may lag behind the counter for odd numbers, or vice versa.
48%
Flag icon
Timestamps from physical clocks are subject to clock skew, which can make them inconsistent with causality.
48%
Flag icon
Each node has a unique identifier, and each node keeps a counter of the number of operations it has processed. The Lamport timestamp is then simply a pair of (counter, node ID).
48%
Flag icon
Two nodes may sometimes have the same counter value, but by including the node ID in the timestamp, each timestamp is made unique.
48%
Flag icon
Lamport timestamps provide a total ordering consistent with causality.
48%
Flag icon
The key idea about Lamport timestamps, which makes them consistent with causality, is the following: every node and every client keeps track of the maximum counter value it has seen so far, and includes that maximum on every request.
48%
Flag icon
When a node receives a request or response with a maximum counter value greater than its own counter value, it immediately increases its own counter to that maximum.
48%
Flag icon
As long as the maximum counter value is carried along with every operation, this scheme ensures that the ordering from the Lamport timestamps is consistent with causality, because every causal dependency results in an increased timestamp.
48%
Flag icon
The advantage of Lamport timestamps over version vectors is that they are more compact.
48%
Flag icon
If another node has generated some operations, but you don’t yet know what they are, you cannot construct the final ordering of operations: the unknown operations from the other node may need to be inserted at various positions in the total order.
48%
Flag icon
If you have an operation to create a username, and you are sure that no other node can insert a claim for the same username ahead of your operation in the total order, then you can safely declare the operation successful.
49%
Flag icon
Total order broadcast is usually described as a protocol for exchanging messages between nodes.
49%
Flag icon
Consensus services such as ZooKeeper and etcd actually implement total order broadcast.
49%
Flag icon
Since all nodes must deliver the same messages in the same order, all nodes can read the log and see the same sequence of messages.
49%
Flag icon
Every request to acquire the lock is appended as a message to the log, and all messages are sequentially numbered in the order they appear in the log. The sequence number can then serve as a fencing token, because it is monotonically increasing.
49%
Flag icon
Because log entries are delivered to all nodes in the same order, if there are several concurrent writes, all nodes will agree on which one came first.
49%
Flag icon
Choosing the first of the conflicting writes as the winner and aborting later ones ensures that all nodes agree on whether a write was committed or aborted.
1 9 15