More on this book
Community
Kindle Notes & Highlights
Read between
August 2 - December 28, 2020
Some applications, such as instant messaging and online games, already have such a “real-time” architecture
The challenge is that the assumption of stateless clients and request/response interactions is very deeply ingrained in our databases, libraries, frameworks, and protocols. Many datastores support read and write operations
In order to extend the write path all the way to the end user, we would need to fundamentally rethink the way we build many of these systems: moving away from request/response interaction and toward publish/subscribe dataflow
This correspondence between serving requests and performing joins is quite fundamental [47]. A one-off read request just passes the request through the join operator and then immediately forgets it; a
Recording a log of read events potentially also has benefits with regard to tracking causal dependencies and data provenance across a system:
storage and I/O cost. Optimizing such systems to reduce the overhead is still an open research problem [2]. But if you already log read requests for operational purposes, as a side effect of request processing, it is not such a great change to make the log the source of the requests instead.
they are designed to remember things forever (more or less), so if something goes wrong, the effects also potentially last forever — which means they require more careful thought
Two-phase commit (see “Atomic Commit and Two-Phase Commit (2PC)”) protocols break the 1:1 mapping between a TCP connection and a transaction, since they must allow a transaction coordinator to reconnect
Even if we can suppress duplicate transactions between the database client and server, we still need to worry about the network between the end-user device and the application server.
For example, you could generate a unique identifier for a request (such as a UUID) and include it as a hidden form field in the client application, or calculate a hash of all the relevant form fields to derive the request ID
The updates to the account balances don’t actually have to happen in the same transaction as the insertion of the event, since they are redundant and
could be derived from the request event in a downstream consumer — as long as the event is processed exactly once, which can again be enforced using the request ID.
This scenario of suppressing duplicate transactions is just one example of a more general principle called the end-to-end argument, which was articulated by Saltzer, Reed, and Clark in 1984
Solving the problem requires an end-to-end solution: a transaction identifier that is passed all the way from the end-user client to the database.
The end-to-end argument also applies to checking the integrity of data: checksums built into Ethernet,
It would be really nice to wrap up the remaining high-level fault-tolerance machinery in an abstraction so that application code needn’t worry about it — but I fear that we have not yet found the right abstraction.
Uniqueness checking can be scaled out by partitioning based on the value that needs to be unique.
However, asynchronous multi-master replication is ruled out, because it could happen that different masters concurrently accept conflicting writes, and thus the values are no longer unique (see “Implementing Linearizable Systems”). If you want to be able to immediately reject any writes that would violate the constraint, synchronous coordination is unavoidable [56].
Uniqueness in log-based messaging
Every request for a username is encoded as a message, and appended to a partition determined by the hash of the username.
stream processor sequentially reads the requests in the log, using a local database to keep track of which usernames are taken. For every request for a username that is available, it records the name as taken and emits a success message to an output stream.
This algorithm is basically the same as in “Implementing linearizable storage using total order broadcast”. It scales easily to a large request throughput by increasing the number of partitions, as each partition can be processed independently.
Multi-partition request processing
The request to transfer money from account A to account B is given a unique request ID by the client, and appended to a log partition based on the request ID. A stream processor reads the log of requests. For each request message it emits two messages to output streams: a debit instruction to the payer account A (partitioned by A), and a credit instruction to the payee account B (partitioned by B). The original request ID is included in those emitted messages. Further processors consume the streams of credit and debit instructions, deduplicate by request ID, and apply the changes to the
...more
A convenient property of transactions is that they are typically linearizable (see “Linearizability”): that is, a writer waits until a transaction is committed, and thereafter its writes are immediately visible to all readers.
consumers. However, it is possible for a client to wait for a message to appear on an output stream. This is what we did
More generally, I think the term consistency conflates two different requirements that are worth considering separately: Timeliness
Timeliness means ensuring that users observe the system in an up-to-date state.
The CAP theorem (see “The Cost of Linearizability”) uses consistency in the sense of linearizability, which is a strong way of achieving timeliness.
Integrity Integrity means absence of corruption;
If integrity is violated, the inconsistency is permanent: waiting and trying again is not going to fix database corruption in most cases. Instead, explicit checking and repair is needed.
In slogan form: violations of timeliness are “eventual consistency,” whereas violations of integrity are “perpetual inconsistency.”
I am going to assert that in most applications, integrity is much more important than timeliness. Violations of timeliness can be annoying and confusing, but violations of integrity can be catastrophic.
ACID transactions usually provide both timeliness (e.g., linearizability) and integrity (e.g., atomic commit) guarantees.
On the other hand, an interesting property of the event-based dataflow systems that we have discussed in this chapter is that they decouple timeliness and integrity. When processing event streams asynchronously, there is no guarantee of timeliness, unless you explicitly build consumers that wait for a message to arrive before returning.
Exactly-once or effectively-once semantics (see “Fault Tolerance”) is a mechanism for preserving integrity.
As we saw in the last section, reliable stream processing systems can preserve integrity without requiring distributed transactions and an atomic commit protocol,
Representing the content of the write operation as a single message, which can easily be written atomically — an approach that fits very well with event sourcing (see “Event Sourcing”) Deriving all other state updates from that single message using deterministic derivation functions, similarly to stored procedures (see “Actual Serial Execution” and “Application code as a derivation function”)
Passing a client-generated request ID through all these levels of processing, enabling end-to-end duplicate suppression and idempotence Making messages immutable and allowing derived data to be reprocessed from time to time, which makes it easier to recover from bugs (see “Advantages of immutable events”)
If two people concurrently register the same username or book the same seat, you can send one of them a message to apologize, and ask them to choose a different one. This kind of change to correct a mistake is called a compensating transaction [59, 60].
Similarly, many airlines overbook airplanes in the expectation that some passengers will miss their flight, and many hotels overbook rooms, expecting that some guests will cancel.
Even if there was no overbooking, apology and compensation processes would be needed in order to deal with flights being cancelled due to bad weather or staff on strike — recovering from such issues is just a normal part of business [3].
In many business contexts, it is actually acceptable to temporarily violate a constraint and fix it up later by apologizing. The cost of the apology (in terms of money or reputation) varies, but it is often quite low:
Whether the cost of the apology is acceptable is a business decision. If it is acceptable, the traditional model of checking all constraints before even writing the data is unnecessarily restrictive, and a linearizable constraint is not needed.
Such coordination-avoiding data systems have a lot of appeal: they can achieve better performance and fault tolerance than systems that need to perform synchronous coordination [56].
but there is no need for everything to pay the cost of coordination if only a small part of an application needs it
For example, large-scale storage systems such as HDFS and Amazon S3 do not fully trust disks: they run background processes that continually read back files, compare them to other replicas, and move files from one disk to another, in order to mitigate the risk of silent corruption
the same argument, it is important to try restoring from your backups from time to time — otherwise you may only find out that your backup is broken when it is too late and you have already lost data. Don’t just blindly trust that it is all working.
I hope that in the future we will see more self-validating or self-auditing systems that continually check their own integrity, rather than relying on blind trust
Even if you capture the transaction logs (see “Change Data Capture”), the insertions, updates, and deletions in various tables do not necessarily give a clear picture of why those mutations were performed.