More on this book
Community
Kindle Notes & Highlights
Read between
August 2 - December 28, 2020
consensus problem.
There are a number of situations in which it is important for nodes to agree.
Leader el...
This highlight has been truncated due to consecutive passage length restrictions.
Atomic commit
FLP result
If the algorithm is allowed to use timeouts, or some other way of identifying suspected crashed nodes (even if the suspicion is sometimes wrong), then consensus becomes solvable
Atomicity prevents failed transactions from littering the database with half-finished results and half-updated state.
especially important for multi-object transactions (see
maintain secondary indexes.
secondary index is a separate data structure from the primary data
record: before that moment, it is still possible to abort (due to a crash),
have a multi-object transaction in a partitioned database, or a term-partitioned secondary index
it is not sufficient to simply send a commit request to all of the nodes and independently commit the transaction on each one.
A transaction commit must be irrevocable — you are not allowed to change your mind and retroactively abort a transaction after it has been committed.
Introduction to two-phase commit
2PC uses a new component that does not normally appear in single-node transactions: a coordinator
The coordinator is often implemented as a library within the same application process that is requesting the transaction
Surely the prepare and commit requests can just as easily be lost in the two-phase case. What makes 2PC different?
transaction, it requests a transaction ID from the coordinator. This transaction ID is globally unique.
participants. If this request fails or times out, the coordinator must retry forever until it succeeds. There is no more going back: if the decision was to commit, that decision must be enforced, no matter how many retries it takes. If a participant has crashed in the meantime, the transaction will be committed when it recovers
If the coordinator fails before sending the prepare requests, a participant can safely abort the transaction. But once the participant has received a prepare request and voted “yes,” it can no longer abort unilaterally — it must wait to hear back from the coordinator whether the transaction was committed or aborted.
in doubt or uncertain.
The only way 2PC can complete is by waiting for the coordinator to recover.
This is why the coordinator must write its commit or abort decision to a transaction log on disk before sending commit or abort requests to participants:
Three-phase commit
Two-phase commit is called a blocking atomic commit protocol due to the fact that 2PC can become stuck waiting for the coordinator to recover.
3PC assumes a network with bounded delay and nodes with bounded response times; in most practical systems with unbounded network delay and process pauses
it cannot guarantee atomicity.
nonblocking atomic commit requires a perfect failure detector [67, 71] — i.e., a reliable mechanism for telling whether a node has crashed
distributed transactions in MySQL are reported to be over 10 times slower than single-node transactions
additional disk forcing (fsync) that is required for crash recovery [88], and the additional network round-trips.
Database-internal distributed transactions
all the nodes participating in the transaction are running the same database software.
Heterogeneous distributed transactions
the participants are two or more different technologies: for example, two databases from different vendors, or even non-database systems such as message brokers.
Exactly-once message processing
For example, say a side effect of processing a message is to send an email, and the email server does not support two-phase commit: it could happen that the email is sent two or more times if message processing fails and is retried.
XA transactions
X/Open XA (short for eXtended Architecture) is a standard for implementing two-phase commit across heterogeneous technologies
XA is not a network protocol — it is merely a C API for interfacing with a transaction coordinator.
in turn is supported by many drivers for databases using Java Database Connectivity (JDBC) and drivers for message brokers using the Java Message Service (JMS) APIs.
If the driver supports XA, that means it calls the XA API to find out whether an operation should be part of a distributed transaction
Since the coordinator’s log is on the application server’s local disk, that server must be restarted,
in practice, orphaned in-doubt transactions do occur [89, 90] — that is, transactions for which the coordinator cannot decide the outcome for whatever reason
Many XA implementations have an emergency escape hatch called heuristic decisions: allowing a participant to unilaterally decide to abort or commit an in-doubt transaction without a definitive decision from the coordinator
heuristic here is a euphemism for probably breaking atomicity, since the heuristic decision violates the system of promises in two-phase commit.
the key realization is that the transaction coordinator is itself a kind of database
If the coordinator is not replicated
Many server-side applications are developed in a stateless model (as
However, when the coordinator is part of the application server, it changes the nature of the deployment.