More on this book
Community
Kindle Notes & Highlights
Read between
April 5 - December 1, 2020
A key feature of a transaction is that it can be aborted and safely retried if an error occurred.
object-relational mapping (ORM) frameworks such as Rails’s ActiveRecord and Django don’t retry aborted transactions — the error usually results in an exception bubbling up the stack, so any user input is thrown away and the user gets an error message.
If the error is due to overload, retrying the transaction will make the problem worse, not better. To avoid such feedback cycles, you can limit the number of retries, use exponential backoff, and handle overload-related errors differently
serializable isolation means that the database guarantees that transactions have the same effect as if they ran serially (i.e., one at a time, without any concurrency). In
databases prevent dirty writes by using row-level locks: when a transaction wants to modify a particular object (row or document), it must first acquire a lock on that object. It must then hold that lock until the transaction is committed or aborted.
Read skew is considered acceptable under read committed isolation:
a key principle of snapshot isolation is readers never block writers, and writers never block readers. This allows a database to handle long-running read queries on a consistent snapshot at the same time as processing writes normally, without any lock contention between the two.
In a supercomputer, a job typically checkpoints the state of its computation to durable storage from time to time. If one node fails, a common solution is to simply stop the entire cluster workload. After the faulty node is repaired, the computation is restarted from the last checkpoint [7, 8]. Thus, a supercomputer is more like a single-node computer than a distributed system: it deals with partial failure by letting it escalate into total failure
Even if TCP acknowledges that a packet was delivered, the application may have crashed before handling it. If you want to be sure that a request was successful, you need a positive response from the application itself [24
Conversely, if something has gone wrong, you may get an error response at some level of the stack, but in general you have to assume that you will get no response at all.
UDP is a good choice in situations where delayed data is worthless.
a circuit is a fixed amount of reserved bandwidth which nobody else can use while the circuit is established, whereas the packets of a TCP connection opportunistically use whatever network bandwidth is available.
multi-tenancy with dynamic resource partitioning provides better utilization, so it is cheaper, but it has the downside of variable delays. Variable delays in networks are not a law of nature, but simply the result of a cost/benefit trade-off.
it makes no sense to compare monotonic clock values from two different computers, because they don’t mean the same thing.
The quartz clock in a computer is not very accurate: it drifts (runs faster or slower than it should). Clock drift varies depending on the temperature of the machine.
NTP synchronization can only be as good as the network delay, so there is a limit to its accuracy when you’re on a congested network with variable packet delays.
When a CPU core is shared between virtual machines, each VM is paused for tens of milliseconds while another VM is running. From an application’s point of view, this pause manifests itself as the clock suddenly jumping forward
Additional causality tracking mechanisms, such as version vectors, are needed in order to prevent violations of causality (see “Detecting Concurrent Writes”).
NTP’s synchronization accuracy is itself limited by the network round-trip time,
if you have two confidence intervals, each consisting of an earliest and latest possible timestamp (A = [Aearliest, Alatest] and B = [Bearliest, Blatest]), and those two intervals do not overlap (i.e., Aearliest < Alatest < Bearliest < Blatest), then B definitely happened after A — there can be no doubt. Only if the intervals overlap are we unsure in which order A and B happened.
Spanner deliberately waits for the length of the confidence interval before committing a read-write transaction. By doing so, it ensures that any transaction that may read the data is at a sufficiently later time, so their confidence intervals do not overlap.
when a node obtains a lease, it knows that it is the leader for some amount of time, until the lease expires. In order to remain leader, the node must periodically renew the lease before it expires. If the node fails, it stops renewing the lease,
code. In the case of a virtual machine, the CPU time spent in other virtual machines is known as steal time.
real-time systems may have lower throughput, since they have to prioritize timely responses above all else
If ZooKeeper is used as lock service, the transaction ID zxid or the node version cversion can be used as fencing token. Since they are guaranteed to be monotonically increasing,
A system is Byzantine fault-tolerant if it continues to operate correctly even if some of the nodes are malfunctioning and not obeying the protocol, or if malicious attackers are interfering with the network.
Most Byzantine fault-tolerant algorithms require a supermajority of more than two-thirds of the nodes to be functioning
The use of multiple servers makes NTP more robust than if it only uses a single server.
The synchronous model assumes bounded network delay, bounded process pauses, and bounded clock error.
uniqueness and monotonic sequence are safety properties, but availability is a liveness property.
scalability is not the only reason for wanting to use a distributed system. Fault tolerance and low latency (by placing data geographically close to users) are equally important goals,
The best way of building fault-tolerant systems is to find some general-purpose abstractions with useful guarantees, implement them once, and then let applications rely on those guarantees.
If two nodes both believe that they are the leader, that situation is called split brain, and it often leads to data loss.
The edge cases of eventual consistency only become apparent when there is a fault in the system (e.g., a network interruption) or at high concurrency.
In a linearizable system, as soon as one client successfully completes a write, all clients reading from the database must be able to see the value just written.
A database may provide both serializability and linearizability, and this combination is known as strict serializability
Coordination services like Apache ZooKeeper [15] and etcd [16] are often used to implement distributed locks and leader election.
libraries like Apache Curator [17] help by providing higher-level recipes on top of ZooKeeper.
hard uniqueness constraint, such as the one you typically find in relational databases, requires linearizability.
Using the leader for reads relies on the assumption that you know for sure who the leader is.
consensus algorithms can implement linearizable storage safely. This is how ZooKeeper [21] and etcd [22] work, for example.
Systems with multi-leader replication are generally not linearizable, because they concurrently process writes on multiple nodes and asynchronously replicate them to other nodes.
it is safest to assume that a leaderless system with Dynamo-style replication does not provide linearizability.
If your application does not require linearizability, then it can be written in a way that each replica can process requests independently, even if it is disconnected from other replicas (e.g., multi-leader). In this case, the application can remain available in the face of a network problem, but its behavior is not linearizable.
CAP is sometimes presented as Consistency, Availability, Partition tolerance: pick 2 out of
if you want linearizability, the response time of read and write requests is at least proportional to the uncertainty of delays in the network.
there are no concurrent operations in a linearizable datastore: there must be a single timeline along which all operations are totally ordered.
linearizability implies causality: any system that is linearizable will preserve causality correctly
causal consistency is the strongest possible consistency model that does not slow down due to network delays, and remains available in the face of network failures
Unlike linearizability, which puts all operations in a single, totally ordered timeline, causality provides us with a weaker consistency model: some things can be concurrent, so the version history is like a timeline with branching and merging. Causal consistency does not have the coordination overhead of linearizability and is much less sensitive to network problems.