Chena Lee’s Kindle Notes & Highlights for Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Rate it:

Open Preview

More on this book

Community

Sparsh Priyadarshi

1 note & 1 highlight

Jefersson Nathan

11 notes & 11 highlights

Charles Fonseca

4 notes & 524 highlights

Ucchishta Sivaguru

9 notes & 20 highlights

Sugan

1 note & 44 highlights

Guzman Monne

28 notes & 34 highlights

Dong

2 notes & 26 highlights

Mohamed Elsherif

5 notes & 17 highlights

Joe Soltzberg

20 notes & 75 highlights

Corey

6 notes & 10 highlights

Dinesh Singh

2 notes & 11 highlights

Robert Gustavo

38 notes & 38 highlights

Cezar Castro rosa

Nikhil Goyal

Vladimir

Ion Gritco

Keith Sader

Guilherme Camargo

Vipin Ajayakumar

Jason

Alexis

Ory

Faisal Morensya

Muhaimen Ezabbad

Frederico Cabral

Ian Dunn

Tali

Antonio Bustamante

Asif Hoda

zhouqiang

Nick Fahrenkrog

Matt Chamlee

Atthavit Wannasakwong

Xuan Lin

Eric Chong

Dallin Coons

Di Fan

Prakash Srivastava

Denis

Kindle Notes & Highlights

by Chena Lee

See all Chena’s Notes & Highlights

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

by Martin Kleppmann

Read between August 2 - December 28, 2020

46%

both serializability and linearizability, and this combination is known as strict serializability or strong one-copy serializability

46%

which linearizability is an important requirement for making a system work correctly.

46%

One way of electing a leader is to use a lock: every node that starts up tries to acquire the lock, and the one that succeeds becomes the leader

46%

They use consensus algorithms to implement linearizable operations in a fault-tolerant way

46%

Uniqueness constraints are common in databases: for example, a username or email address must uniquely identify one user, and

46%

These constraints all require there to be a single up-to-date value (the account balance, the stock level, the seat occupancy) that all nodes agree on.

46%

the simplest answer would be to really only use a single copy of the data. However, that approach would not be able to tolerate faults:

46%

The most common approach to making a system fault-tolerant is to use replication.

46%

nodes. If you make reads from the leader, or from synchronously updated followers, they have the potential to be linearizable.iv However, not every single-leader database is actually linearizable, either

46%

Using the leader for reads relies on the assumption that you know for sure who the leader is. As discussed in “The Truth Is Defined by the Majority”, it is quite possible for a node to think that it is the leader, when in fact it is not

46%

requests, it is likely to violate linearizability [20].

46%

With asynchronous replication, failover may even lose committed writes (see “Handling Node Outages”), which violates bo...

This highlight has been truncated due to consecutive passage length restrictions.

46%

Systems with multi-leader replication are generally not linearizable, because they concurrently process writes on multiple nodes and asynchronously replicate them to other nodes.

46%

Intuitively, it seems as though strict quorum reads and writes should be linearizable in a Dynamo-style model.

46%

when we have variable network delays, it is possible to have race conditions, as demonstrated

46%

client B reads from a different quorum of two nodes, and gets back the old value 0 from both.

46%

The quorum condition is met (w + r > n), but this execution is nevertheless not linearizable:

46%

summary, it is safest to assume that a leaderless system with Dynamo-style replication does not provide linearizability.

47%

Thus, applications that don’t require linearizability can be more tolerant of network problems.

47%

CAP theorem

47%

Although linearizability is a useful guarantee, surprisingly few systems are actually linearizable in practice. For example, even RAM on a modern multi-core CPU is not linearizable

47%

The reason for this behavior is that every CPU core has its own memory cache and store buffer.

47%

However, there are now several copies of the data (one in main memory, and perhaps several more in various caches), and these copies are asynchronously updated, so linearizability is lost.

47%

The reason for dropping linearizability is performance, not fault tolerance.

47%

that every operation appears to take effect atomically at one point in time. This definition implies that operations are executed in some well-defined order. We

47%

we saw that the main purpose of the leader in single-leader replication is to determine the order of writes in the replication

47%

If there is no single leader, conflicts can occur due to concurrent operations

47%

Serializability, which we discussed in Chapter 7, is about ensuring that transactions behave as if they were ex...

This highlight has been truncated due to consecutive passage length restrictions.

47%

achieved by literally executing transactions in that serial order, or by allowing concurrent execution while pr...

This highlight has been truncated due to consecutive passage length restrictions.

47%

timestamps and clocks in distrib...

This highlight has been truncated due to consecutive passage length restrictions.

47%

is another attempt to introduce order into a disorderly world,

47%

why ordering keeps coming up, and one of the reasons is that it helps preserve causality.

47%

conversation

47%

that there is a causal dependency between the question and the answer.

47%

This happened before relationship is another expression of causality:

47%

a consistent snapshot. But what does “consistent” mean in this context? It means consistent with causality: if the snapshot contains an answer, it must also contain the question being answered

47%

Serializable snapshot isolation (see “Serializable Snapshot Isolation (SSI)”) detects write skew by tracking the causal dependencies between transactions.

47%

Causality imposes an ordering on events: cause comes before effect;

47%

If a system obeys the ordering imposed by causality, we say that it is causally consistent.

47%

other. We say they are incomparable, and therefore mathematical sets are partially ordered:

47%

two events are ordered if they are causally related (one happened before the other), but they are incomparable if they are concurrent. This means that causality defines a partial order, not a total order: