More on this book
Community
Kindle Notes & Highlights
Read between
August 2 - December 28, 2020
Traditional relational databases don’t limit the duration of a transaction,
wait for human input.
databases running 2PL can have quite unstable latencies,
can be very slow at high percentiles
Although deadlocks can happen with the lock-based read committed isolation level, they occur much more frequently under 2PL serializable isolation
If deadlocks are frequent, this can mean significant wasted effort.
a predicate lock
it belongs to all objects that match some search condition,
If transaction A wants to insert, update, or delete any object, it must first check whether either the old or the new value matches any existing predicate lock.
a predicate lock applies even to objects that do not yet exist in the database, but which might be added in the future (phantoms).
checking for matching locks becomes time-consuming.
most databases with 2PL actually implement index-range locking
next-key lo...
This highlight has been truncated due to consecutive passage length restrictions.
It’s safe to simplify a predicate by making it match a greater set of objects.
If there is no suitable index where a range lock can be attached, the database can fall back to a shared lock on the entire table.
Serializable Snapshot Isolation (SSI)
Are serializable isolation and good performance fundamentally at odds with each other?
serializable snapshot isolation (SSI) is very promising.
Today SSI is used both in single-node databases
and distributed databases
Two-phase locking is a so-called pessimistic concurrency control mechanism:
By contrast, serializable snapshot isolation is an optimistic concurrency control technique.
When a transaction wants to commit, the database checks whether anything bad happened
if so, the transaction is aborted and has to be retried.
Only transactions that executed serializably are a...
This highlight has been truncated due to consecutive passage length restrictions.
It performs badly if there is high contention
if there is enough spare capacity, and if contention between transactions is not too high, optimistic concurrency control techniques tend to perform better
for example, if several transactions concurrently want to increment a counter, it doesn’t matter in which order the increments are applied
SSI is based on snapshot isolation
techniques. On top of snapshot isolation, SSI adds an algorithm for detecting serialization conflicts among writes and determining which transactions to abort.
under snapshot isolation, the result from the original query may no longer be up-to-date by the time the transaction commits,
the transaction is taking an action based on a premise
Later, when the transaction wants to commit, the original data may have changed — the premise may no longer be true.
To be safe, the database needs to assume that any change in the query result (the premise) means that writes in that transaction may be invalid.
causal dependency between the queries and the writes in the transaction.
Detecting reads of a stale MVCC object version (uncommitted write occurred before the read)
Detecting writes that affect prior reads
When the transaction wants to commit, the database checks whether any of the ignored writes have now been committed. If so, the transaction must be aborted.
shift_id, the database can use the index entry 1234 to record the fact that transactions 42 and 43 read this data.
a transaction writes to the database, it must look in the indexes for any other transactions that have recently read the affected data.
it simply notifies the transactions that the data they read may no longer be up to date.
Less detailed tracking is faster, but may lead to more transactions being aborted than strictly necessary.
depending on what else happened, it’s sometimes possible to prove that the result of the execution is nevertheless serializable.
serializable snapshot isolation is not limited to the throughput of a single CPU core:
Even though data may be partitioned across multiple machines, transactions can read and write data in multiple partitions while ensuring serializable isolation
so SSI requires that read-write transactions be fairly short
(long-running read-only transactions may be okay).
A large class of errors is reduced down to a simple transaction abort, and the application just needs to try again.
Weak isolation levels protect against some of those anomalies but leave you, the application developer, to handle others manually (e.g., using explicit locking).
In distributed systems, we are no longer operating in an idealized system model