More on this book
Community
Kindle Notes & Highlights
Read between
October 21 - November 26, 2024
Optimistic in this context means that instead of blocking if something potentially dangerous happens, transactions continue anyway, in the hope that everything will turn out all right.
When a transaction wants to commit, the database checks whether anything bad happened (i.e., whether isolation was violated); if so, the trans...
This highlight has been truncated due to consecutive passage length restrictions.
Only transactions that executed serializably are a...
This highlight has been truncated due to consecutive passage length restrictions.
On top of snapshot isolation, SSI adds an algorithm for detecting serialization conflicts among writes and determining which transactions to abort.
In order to provide serializable isolation, the database must detect situations in which a transaction may have acted on an outdated premise and abort the transaction in that case.
When a transaction reads from a consistent snapshot in an MVCC database, it ignores writes that were made by any other transactions that hadn’t yet committed at the time when the snapshot was taken.
When the transaction wants to commit, the database checks whether any of the ignored writes have now been committed.
When a transaction writes to the database, it must look in the indexes for any other transactions that have recently read the affected data.
Compared to two-phase locking, the big advantage of serializable snapshot isolation is that one transaction doesn’t need to block waiting for locks held by another transaction.
Transactions are an abstraction layer that allows an application to pretend that certain concurrency problems and certain kinds of hardware and software faults don’t exist.
A transaction reads something, makes a decision based on the value it saw, and writes the decision to the database.
When a transaction wants to commit, it is checked, and it is aborted if the execution was not serializable.
If the interruption happens after the client has requested a commit but before the server acknowledges that the commit happened, the client doesn’t know whether the transaction was committed or not.
To solve this issue, a transaction manager can group operations by a unique transaction identifier that is not bound to a particular TCP connection.
As we come to understand various edge cases that can occur in real systems, we get better at handling them.
This chapter is a thoroughly pessimistic and depressing overview of things that may go wrong in a distributed system.
An individual computer with good software is usually either fully functional or entirely broken, but not something in between.
When you are writing software that runs on several computers, connected by a network, the situation is fundamentally different.
In a supercomputer, a job typically checkpoints the state of its computation to durable storage from time to time.
The bigger a system gets, the more likely it is that one of its components is broken.
The fault handling must be part of the software design, and you (as operator of the software) need to know what behavior to expect from the software in the case of a fault.
In distributed systems, suspicion, pessimism, and paranoia pay off.
Although the system can be more reliable than its underlying parts, there is always a limit to how much more reliable it can be.
If you send a request and don’t get a response, it’s not possible to distinguish whether (a) the request was lost, (b) the remote node is down, or (c) the response was lost.
Rapid feedback about a remote node being down is useful, but you can’t count on it.
When driving a car, travel times on road networks often vary most due to traffic congestion.
When a packet reaches the destination machine, if all CPU cores are currently busy, the incoming request from the network is queued by the operating system until the application is ready to handle it.
TCP performs flow control (also known as congestion avoidance or backpressure), in which a node limits its own rate of sending in order to avoid overloading a network link or the receiving node
This means additional queueing at the sender before the data even enters the network.
Some latency-sensitive applications, such as videoconferencing and Voice over IP (VoIP), use UDP rather than TCP.
UDP is a good choice in situations where delayed data is worthless.
When you make a call over the telephone network, it establishes a circuit: a fixed, guaranteed amount of bandwidth is allocated for the call, along the entire route between the two callers.
Note that a circuit in a telephone network is very different from a TCP connection: a circuit is a fixed amount of reserved bandwidth which nobody else can use while the circuit is established, whereas the packets of a TCP connection opportunistically use whatever network bandwidth is available.
A circuit is good for an audio or video call, which needs to transfer a fairly constant number of bits per second for the duration of the call.
The wire has a fixed cost, so if you utilize it better, each byte you send over the wire is cheaper.
Better hardware utilization is also a significant motivation for using virtual machines.
The time when a message is received is always later than the time when it is sent, but due to variable delays in the network, we don’t know how much later.
Time-of-day clocks are usually synchronized with NTP, which means that a timestamp from one machine (ideally) means the same as a timestamp on another machine.
The name comes from the fact that they are guaranteed to always move forward (whereas a time-of-day clock may jump back in time).
NTP may adjust the frequency at which the monotonic clock moves forward (this is known as slewing the clock) if it detects that the computer’s local quartz is moving faster or slower than the NTP server.
By default, NTP allows the clock rate to be speeded up or slowed down by up to 0.05%, but NTP cannot cause the monotonic clock to jump forward or backward.
Monotonic clocks don’t need synchronization, but time-of-day clocks need to be set according to an NTP server or other external time source in order to be useful.
Clock drift varies depending on the temperature of the machine.
NTP clients are quite robust, because they query several servers and ignore outliers.
When a CPU core is shared between virtual machines, each VM is paused for tens of milliseconds while another VM is running.
Even though networks are well behaved most of the time, software must be designed on the assumption that the network will occasionally be faulty, and the software must handle such faults gracefully.
If a machine’s CPU is defective or its network is misconfigured, it most likely won’t work at all, so it will quickly be noticed and fixed.
On the other hand, if its quartz clock is defective or its NTP client is misconfigured, most things will seem to work fine, even though its clock gradually drifts further and further away from reality.