Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Rate it:
Open Preview
33%
Flag icon
At some later time, when it is certain that no transaction can any longer access the deleted data, a garbage collection process in the database removes any row...
This highlight has been truncated due to consecutive passage length restrictions.
33%
Flag icon
An update is internally translated into a dele...
This highlight has been truncated due to consecutive passage length restrictions.
33%
Flag icon
When a transaction reads from the database, transaction IDs are used to decide which objects it can see and which are invisible.
33%
Flag icon
By never updating values in place but instead creating a new version every time a value is changed, the database can provide a consistent snapshot while incurring only a small overhead.
33%
Flag icon
PostgreSQL and MySQL call their snapshot isolation level repeatable read because it meets the requirements of the standard, and so they can claim standards compliance.
33%
Flag icon
As a result, nobody really knows what repeatable read means.
34%
Flag icon
The lost update problem can occur if an application reads some value from the database, modifies it, and writes back the modified value (a read-modify-write cycle).
34%
Flag icon
Atomic operations are usually implemented by taking an exclusive lock on the object when it is read so that no other transaction can read it until the update has been applied.
34%
Flag icon
Another option for preventing lost updates, if the database’s built-in atomic operations don’t provide the necessary functionality, is for the application to explicitly lock objects that are going to be updated.
34%
Flag icon
The FOR UPDATE clause indicates that the database should take a lock on all rows returned by this query.
34%
Flag icon
Atomic operations and locks are ways of preventing lost updates by forcing the read-modify-write cycles to happen sequentially.
34%
Flag icon
An alternative is to allow them to execute in parallel and, if the transaction manager detects a lost update, abort the transaction and force it to retry its read-modify-write cycle.
34%
Flag icon
If the current value does not match what you previously read, the update has no effect, and the read-modify-write cycle must be retried.
34%
Flag icon
UPDATE wiki_pages SET content = 'new content'   WHERE id = 1234 AND content = 'old content';
34%
Flag icon
Locks and compare-and-set operations assume that there is a single up-to-date copy of the data.
34%
Flag icon
However, databases with multi-leader or leaderless replication usually allow several writes to happen concurrently and replicate them asynchronously, so they cannot guarantee that there is a single up-to-date copy of the data.
34%
Flag icon
Write skew can occur if two transactions read the same objects, and then update some of those objects (different transactions may update different objects).
35%
Flag icon
If the query in step 1 doesn’t return any rows, SELECT FOR UPDATE can’t attach locks to anything.
35%
Flag icon
If the problem of phantoms is that there is no object to which we can attach the locks, perhaps we can artificially introduce a lock object into the database?
35%
Flag icon
Serializable isolation is usually regarded as the strongest isolation level. It guarantees that even though transactions may execute in parallel, the end result is the same as if they had executed one at a time, serially, without any concurrency.
35%
Flag icon
When all data that a transaction needs to access is in memory, transactions can execute much faster than if they have to wait for data to be loaded from disk.
35%
Flag icon
A system designed for single-threaded execution can sometimes perform better than a system that supports concurrency, because it can avoid the coordination overhead of locking.
35%
Flag icon
In the early days of databases, the intention was that a database transaction could encompass an entire flow of user activity.
35%
Flag icon
If a database transaction needs to wait for input from a user, the database needs to support a potentially huge number of concurrent transactions, most of them idle.
35%
Flag icon
A new HTTP request starts a new transaction.
35%
Flag icon
A database is often much more performance-sensitive than an application server, because a single database instance is often shared by many application servers.
35%
Flag icon
With stored procedures and in-memory data, executing all transactions on a single thread becomes feasible.
35%
Flag icon
Executing all transactions serially makes concurrency control much simpler, but limits the transaction throughput of the database to the speed of a single CPU core on a single machine.
35%
Flag icon
Read-only transactions may execute elsewhere, using snapshot isolation, but for applications with high write throughput, the single-threaded transaction processor can become a serious bottleneck.
35%
Flag icon
If you can find a way of partitioning your dataset so that each transaction only needs to read and write data within a single partition, then each partition can have its own transaction processing thread running independently from the others.
35%
Flag icon
Since cross-partition transactions have additional coordination overhead, they are vastly slower than single-partition transactions.
35%
Flag icon
Whether transactions can be single-partition depends very much on the structure of the data used by the application.
36%
Flag icon
Several transactions are allowed to concurrently read the same object as long as nobody is writing to it.
36%
Flag icon
If transaction A has read an object and transaction B wants to write to that object, B must wait until A commits or aborts before it can continue.
36%
Flag icon
If transaction A has written an object and transaction B wants to read that object, B must wait until A commits or aborts before it can continue.
36%
Flag icon
The blocking of readers and writers is implemented by having a lock on each object in the database.
36%
Flag icon
If a transaction wants to read an object, it must first acquire the lock in shared mode.
36%
Flag icon
If a transaction wants to write to an object, it must first acquire the lock in exclusive mode.
36%
Flag icon
The database automatically detects deadlocks between transactions and aborts one of them so that the others can make progress.
36%
Flag icon
Traditional relational databases don’t limit the duration of a transaction, because they are designed for interactive applications that wait for human input.
36%
Flag icon
It may take just one slow transaction, or one transaction that accesses a lot of data and acquires many locks, to cause the rest of the system to grind to a halt.
36%
Flag icon
A database with serializable isolation must prevent phantoms.
36%
Flag icon
The key idea here is that a predicate lock applies even to objects that do not yet exist in the database, but which might be added in the future (phantoms).
36%
Flag icon
It’s safe to simplify a predicate by making it match a greater set of objects.
36%
Flag icon
This is safe, because any write that matches the original predicate will definitely also match the approximations.
36%
Flag icon
Index-range locks are not as precise as predicate locks would be (they may lock a bigger range of objects than is strictly necessary to maintain serializability), but since they have much lower overheads, they are a good compromise.
36%
Flag icon
If there is no suitable index where a range lock can be attached, the database can fall back to a shared lock on the entire table.
36%
Flag icon
Two-phase locking is a so-called pessimistic concurrency control mechanism: it is based on the principle that if anything might possibly go wrong (as indicated by a lock held by another transaction), it’s better to wait until the situation is safe again before doing anything.
36%
Flag icon
Serial execution is, in a sense, pessimistic to the extreme: it is essentially equivalent to each transaction having an exclusive lock on the entire database (or one partition of the database) for the duration of the transaction.
36%
Flag icon
By contrast, serializable snapshot isolation is an optimistic concurrency control technique.
1 6 15