More on this book
Community
Kindle Notes & Highlights
Read between
August 2 - December 28, 2020
abandoned multi-object transactions because they are difficult to implement across partitions,
In a relational data model, a row in one table often has a foreign key reference to a row in another table.
Multi-object transactions allow you to ensure that these references remain valid:
document data model,
no multi-object transactions are needed when
However, document databases lacking join functionality also encourage denormalization
particular, datastores with leaderless replication (see “Leaderless Replication”) work much more on a “best effort” basis, which could be summarized as “the database will do as much as it can, and if it runs into an error, it won’t undo something it has already done”
the happy path rather than the intricacies of error handling. For example, popular object-relational mapping (ORM) frameworks such as Rails’s ActiveRecord and Django don’t retry aborted transactions
This is a shame, because the whole point of aborts is to enable safe retries.
If the transaction actually succeeded, but the network failed while the server tried to acknowledge the successful commit to the client (so the client thinks it failed), then retrying the transaction causes it to be performed twice — unless
you have an additional application-level deduplication mechanism in place.
If the error is due to overload, retrying the transaction will ma...
This highlight has been truncated due to consecutive passage length restrictions.
If the transaction also has side effects outside of the database, those side effects may happen even if the transaction is aborted. For example, if you’re sending an email, you wouldn’t want to send the email again every time you retry the transaction.
the client process fails while retrying, any data it was trying to write to the database is lost.
In theory, isolation should make your life easier by letting you pretend that no concurrency is happening:
serializable isolation means that the database guarantees that transactions have the same effect as if they ran serially (i.e., one at a time, without any concurrency).
It’s therefore common for systems to use weaker levels of isolation, which protect against some concurrency issues, but not all.
Even many popular relational database systems (which are usually considered “ACID”) use weak isolation,
we need to develop a good understanding of the kinds of concurrency problems that exist, and how to prevent them.
read committed.v
(no dirty reads).
you will only overwrite data that has been committed (no dirty writes).
read committed does not prevent the race condition between two counter increments in Figure 7-1. In this case, the second write happens after the first transaction has committed, so it’s not a dirty write.
Read committed is a very popular isolation level. It is the default setting in Oracle 11g, PostgreSQL, SQL Server 2012, MemSQL, and many other databases
How do we prevent dirty reads? One option would be to use the same lock, and to require any transaction that wants to read an object to briefly acquire the lock and then release it again immediately after reading.
because one long-running write transaction can force many other transactions to wait until the long-running transaction has completed,
While the transaction is ongoing, any other transactions that read the object are simply given the old value.
This anomaly is called read skew, and it is an example of a nonrepeatable read:
Read skew is considered acceptable under read committed isolation:
During the time that the backup process is running, writes will continue to be made to the database.
If you need to restore from such a backup, the inconsistencies (such as disappearing money) become permanent.
Snapshot isolation
The idea is that each transaction reads from a consistent snapshot of the database — that is, the transaction sees all the data that was committed in the database at the start of the transaction.
Snapshot isolation is a boon for long-running, read-only queries such as backups and analytics.
performance point of view, a key principle of snapshot isolation is readers never block writers, and writers never block readers.
The database must potentially keep several different committed versions of an object, because various in-progress transactions may need to see the state of the database at different points in time.
multi-version concurrency control (MVCC).
read committed is...
This highlight has been truncated due to consecutive passage length restrictions.
be sufficient to keep two versions ...
This highlight has been truncated due to consecutive passage length restrictions.
that support snapshot isolation typically use MVCC for their read committed isolation level as well.
read committed uses a separate snapshot for each query, while snapshot isolation uses the same snapshot for an entire transaction.
MVCC-based snapshot isolation is implemented in PostgreSQL
When a transaction is started, it is given a unique, always-increasingvii transaction ID (txid).
Each row in a table has a created_by field, containing the ID of the transaction that inserted this row into the table.
each row has a deleted_by field, which is initially empty.
a transaction deletes a row, the row isn’t actually deleted from the database, but it is marked for deletion by setti...
This highlight has been truncated due to consecutive passage length restrictions.
data, a garbage collection process in the database removes any rows marked for deletion
delete and a create.
The accounts table now actually contains two rows for account 2: a row with a balance of $500 which was marked as deleted by transaction 13, and a row with a balance of $400 which was created by transaction 13.
Indexes and snapshot isolation