Chena Lee’s Kindle Notes & Highlights for Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Rate it:

Open Preview

More on this book

Community

Sparsh Priyadarshi

1 note & 1 highlight

Jefersson Nathan

11 notes & 11 highlights

Charles Fonseca

4 notes & 524 highlights

Ucchishta Sivaguru

9 notes & 20 highlights

Sugan

1 note & 44 highlights

Guzman Monne

28 notes & 34 highlights

Dong

2 notes & 26 highlights

Mohamed Elsherif

5 notes & 17 highlights

Joe Soltzberg

20 notes & 75 highlights

Corey

6 notes & 10 highlights

Dinesh Singh

2 notes & 11 highlights

Robert Gustavo

38 notes & 38 highlights

Cezar Castro rosa

Nikhil Goyal

Vladimir

Ion Gritco

Keith Sader

Guilherme Camargo

Vipin Ajayakumar

Jason

Alexis

Ory

Faisal Morensya

Muhaimen Ezabbad

Frederico Cabral

Ian Dunn

Antonio Bustamante

Asif Hoda

zhouqiang

Nick Fahrenkrog

Matt Chamlee

Atthavit Wannasakwong

Xuan Lin

Eric Chong

Dallin Coons

Di Fan

Prakash Srivastava

Denis

Kindle Notes & Highlights

by Chena Lee

See all Chena’s Notes & Highlights

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

by Martin Kleppmann

Read between August 2 - December 28, 2020

35%

Traditional relational databases don’t limit the duration of a transaction,

35%

wait for human input.

35%

databases running 2PL can have quite unstable latencies,

35%

can be very slow at high percentiles

36%

Although deadlocks can happen with the lock-based read committed isolation level, they occur much more frequently under 2PL serializable isolation

36%

If deadlocks are frequent, this can mean significant wasted effort.

36%

a predicate lock

36%

it belongs to all objects that match some search condition,

36%

If transaction A wants to insert, update, or delete any object, it must first check whether either the old or the new value matches any existing predicate lock.

36%

a predicate lock applies even to objects that do not yet exist in the database, but which might be added in the future (phantoms).

36%

checking for matching locks becomes time-consuming.

36%

most databases with 2PL actually implement index-range locking

36%

next-key lo...

This highlight has been truncated due to consecutive passage length restrictions.

36%

It’s safe to simplify a predicate by making it match a greater set of objects.

36%

If there is no suitable index where a range lock can be attached, the database can fall back to a shared lock on the entire table.

36%

Serializable Snapshot Isolation (SSI)

36%

Are serializable isolation and good performance fundamentally at odds with each other?

36%

serializable snapshot isolation (SSI) is very promising.

36%

Today SSI is used both in single-node databases

36%

and distributed databases

36%

Two-phase locking is a so-called pessimistic concurrency control mechanism:

36%

By contrast, serializable snapshot isolation is an optimistic concurrency control technique.

36%

When a transaction wants to commit, the database checks whether anything bad happened

36%

if so, the transaction is aborted and has to be retried.

36%

Only transactions that executed serializably are a...

This highlight has been truncated due to consecutive passage length restrictions.

36%

It performs badly if there is high contention

36%

if there is enough spare capacity, and if contention between transactions is not too high, optimistic concurrency control techniques tend to perform better

36%

for example, if several transactions concurrently want to increment a counter, it doesn’t matter in which order the increments are applied

36%

SSI is based on snapshot isolation

36%

techniques. On top of snapshot isolation, SSI adds an algorithm for detecting serialization conflicts among writes and determining which transactions to abort.

36%

under snapshot isolation, the result from the original query may no longer be up-to-date by the time the transaction commits,

36%

the transaction is taking an action based on a premise

36%

Later, when the transaction wants to commit, the original data may have changed — the premise may no longer be true.

36%

To be safe, the database needs to assume that any change in the query result (the premise) means that writes in that transaction may be invalid.

36%

causal dependency between the queries and the writes in the transaction.

36%

Detecting reads of a stale MVCC object version (uncommitted write occurred before the read)

36%

Detecting writes that affect prior reads

36%

When the transaction wants to commit, the database checks whether any of the ignored writes have now been committed. If so, the transaction must be aborted.

36%

shift_id, the database can use the index entry 1234 to record the fact that transactions 42 and 43 read this data.

36%

a transaction writes to the database, it must look in the indexes for any other transactions that have recently read the affected data.

36%

it simply notifies the transactions that the data they read may no longer be up to date.

36%

Less detailed tracking is faster, but may lead to more transactions being aborted than strictly necessary.

36%

depending on what else happened, it’s sometimes possible to prove that the result of the execution is nevertheless serializable.

36%

serializable snapshot isolation is not limited to the throughput of a single CPU core:

36%

Even though data may be partitioned across multiple machines, transactions can read and write data in multiple partitions while ensuring serializable isolation

36%

so SSI requires that read-write transactions be fairly short

36%

(long-running read-only transactions may be okay).

37%

A large class of errors is reduced down to a simple transaction abort, and the application just needs to try again.

37%

Weak isolation levels protect against some of those anomalies but leave you, the application developer, to handle others manually (e.g., using explicit locking).

38%

In distributed systems, we are no longer operating in an idealized system model

« Prev 1 … 10 11 12 … 28 Next »

See a Problem?

Preview — Designing Data-Intensive Applications by Martin Kleppmann