Nikhil Goyal’s Kindle Notes & Highlights for Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Rate it:

Open Preview

More on this book

Community

Sparsh Priyadarshi

1 note & 1 highlight

Jefersson Nathan

11 notes & 11 highlights

Charles Fonseca

4 notes & 524 highlights

Ucchishta Sivaguru

9 notes & 20 highlights

Sugan

1 note & 44 highlights

Guzman Monne

28 notes & 34 highlights

Dong

2 notes & 26 highlights

Mohamed Elsherif

5 notes & 17 highlights

Chena Lee

6 notes & 1353 highlights

Joe Soltzberg

20 notes & 75 highlights

Corey

6 notes & 10 highlights

Dinesh Singh

2 notes & 11 highlights

Robert Gustavo

38 notes & 38 highlights

Cezar Castro rosa

Vladimir

Ion Gritco

Keith Sader

Guilherme Camargo

Vipin Ajayakumar

Jason

Alexis

Ory

Faisal Morensya

Muhaimen Ezabbad

Frederico Cabral

Ian Dunn

Antonio Bustamante

Asif Hoda

zhouqiang

Nick Fahrenkrog

Matt Chamlee

Atthavit Wannasakwong

Xuan Lin

Eric Chong

Dallin Coons

Di Fan

Prakash Srivastava

Denis

Kindle Notes & Highlights

by Nikhil Goyal

See all Nikhil’s Notes & Highlights

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

by Martin Kleppmann

Read between October 21 - November 26, 2024

33%

At some later time, when it is certain that no transaction can any longer access the deleted data, a garbage collection process in the database removes any row...

This highlight has been truncated due to consecutive passage length restrictions.

33%

An update is internally translated into a dele...

This highlight has been truncated due to consecutive passage length restrictions.

33%

When a transaction reads from the database, transaction IDs are used to decide which objects it can see and which are invisible.

33%

By never updating values in place but instead creating a new version every time a value is changed, the database can provide a consistent snapshot while incurring only a small overhead.

33%

PostgreSQL and MySQL call their snapshot isolation level repeatable read because it meets the requirements of the standard, and so they can claim standards compliance.

33%

As a result, nobody really knows what repeatable read means.

34%

The lost update problem can occur if an application reads some value from the database, modifies it, and writes back the modified value (a read-modify-write cycle).

34%

Atomic operations are usually implemented by taking an exclusive lock on the object when it is read so that no other transaction can read it until the update has been applied.

34%

Another option for preventing lost updates, if the database’s built-in atomic operations don’t provide the necessary functionality, is for the application to explicitly lock objects that are going to be updated.

34%

The FOR UPDATE clause indicates that the database should take a lock on all rows returned by this query.

34%

Atomic operations and locks are ways of preventing lost updates by forcing the read-modify-write cycles to happen sequentially.

34%

An alternative is to allow them to execute in parallel and, if the transaction manager detects a lost update, abort the transaction and force it to retry its read-modify-write cycle.

34%

If the current value does not match what you previously read, the update has no effect, and the read-modify-write cycle must be retried.

34%

UPDATE wiki_pages SET content = 'new content' WHERE id = 1234 AND content = 'old content';

34%

Locks and compare-and-set operations assume that there is a single up-to-date copy of the data.

34%

However, databases with multi-leader or leaderless replication usually allow several writes to happen concurrently and replicate them asynchronously, so they cannot guarantee that there is a single up-to-date copy of the data.

34%

Write skew can occur if two transactions read the same objects, and then update some of those objects (different transactions may update different objects).

35%

If the query in step 1 doesn’t return any rows, SELECT FOR UPDATE can’t attach locks to anything.

35%

If the problem of phantoms is that there is no object to which we can attach the locks, perhaps we can artificially introduce a lock object into the database?

35%

Serializable isolation is usually regarded as the strongest isolation level. It guarantees that even though transactions may execute in parallel, the end result is the same as if they had executed one at a time, serially, without any concurrency.

35%

When all data that a transaction needs to access is in memory, transactions can execute much faster than if they have to wait for data to be loaded from disk.

35%

A system designed for single-threaded execution can sometimes perform better than a system that supports concurrency, because it can avoid the coordination overhead of locking.

35%

In the early days of databases, the intention was that a database transaction could encompass an entire flow of user activity.

35%

If a database transaction needs to wait for input from a user, the database needs to support a potentially huge number of concurrent transactions, most of them idle.

35%

A new HTTP request starts a new transaction.

35%

A database is often much more performance-sensitive than an application server, because a single database instance is often shared by many application servers.

35%

With stored procedures and in-memory data, executing all transactions on a single thread becomes feasible.

35%

Executing all transactions serially makes concurrency control much simpler, but limits the transaction throughput of the database to the speed of a single CPU core on a single machine.

35%

Read-only transactions may execute elsewhere, using snapshot isolation, but for applications with high write throughput, the single-threaded transaction processor can become a serious bottleneck.

35%

If you can find a way of partitioning your dataset so that each transaction only needs to read and write data within a single partition, then each partition can have its own transaction processing thread running independently from the others.

35%

Since cross-partition transactions have additional coordination overhead, they are vastly slower than single-partition transactions.

35%

Whether transactions can be single-partition depends very much on the structure of the data used by the application.

36%

Several transactions are allowed to concurrently read the same object as long as nobody is writing to it.

36%

If transaction A has read an object and transaction B wants to write to that object, B must wait until A commits or aborts before it can continue.

36%

If transaction A has written an object and transaction B wants to read that object, B must wait until A commits or aborts before it can continue.

36%

The blocking of readers and writers is implemented by having a lock on each object in the database.

36%

If a transaction wants to read an object, it must first acquire the lock in shared mode.

36%

If a transaction wants to write to an object, it must first acquire the lock in exclusive mode.

36%

The database automatically detects deadlocks between transactions and aborts one of them so that the others can make progress.

36%

Traditional relational databases don’t limit the duration of a transaction, because they are designed for interactive applications that wait for human input.

36%

It may take just one slow transaction, or one transaction that accesses a lot of data and acquires many locks, to cause the rest of the system to grind to a halt.

36%

A database with serializable isolation must prevent phantoms.

36%

The key idea here is that a predicate lock applies even to objects that do not yet exist in the database, but which might be added in the future (phantoms).

36%

It’s safe to simplify a predicate by making it match a greater set of objects.

36%

This is safe, because any write that matches the original predicate will definitely also match the approximations.

36%

Index-range locks are not as precise as predicate locks would be (they may lock a bigger range of objects than is strictly necessary to maintain serializability), but since they have much lower overheads, they are a good compromise.

36%

If there is no suitable index where a range lock can be attached, the database can fall back to a shared lock on the entire table.

36%

Two-phase locking is a so-called pessimistic concurrency control mechanism: it is based on the principle that if anything might possibly go wrong (as indicated by a lock held by another transaction), it’s better to wait until the situation is safe again before doing anything.

36%

Serial execution is, in a sense, pessimistic to the extreme: it is essentially equivalent to each transaction having an exclusive lock on the entire database (or one partition of the database) for the duration of the transaction.

36%

By contrast, serializable snapshot isolation is an optimistic concurrency control technique.

« Prev 1 … 5 6 7 … 15 Next »

See a Problem?

Preview — Designing Data-Intensive Applications by Martin Kleppmann