Di Fan’s Kindle Notes & Highlights for Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Rate it:

Open Preview

More on this book

Community

Sparsh Priyadarshi

1 note & 1 highlight

Jefersson Nathan

11 notes & 11 highlights

Charles Fonseca

4 notes & 524 highlights

Ucchishta Sivaguru

9 notes & 20 highlights

Sugan

1 note & 44 highlights

Guzman Monne

28 notes & 34 highlights

Dong

2 notes & 26 highlights

Mohamed Elsherif

5 notes & 17 highlights

Chena Lee

6 notes & 1353 highlights

Joe Soltzberg

20 notes & 75 highlights

Corey

6 notes & 10 highlights

Dinesh Singh

2 notes & 11 highlights

Robert Gustavo

38 notes & 38 highlights

Cezar Castro rosa

Nikhil Goyal

Vladimir

Ion Gritco

Keith Sader

Guilherme Camargo

Vipin Ajayakumar

Jason

Alexis

Ory

Faisal Morensya

Muhaimen Ezabbad

Frederico Cabral

Ian Dunn

Antonio Bustamante

Asif Hoda

zhouqiang

Nick Fahrenkrog

Matt Chamlee

Atthavit Wannasakwong

Xuan Lin

Eric Chong

Dallin Coons

Prakash Srivastava

Denis

Kindle Notes & Highlights

by Di Fan

See all Di’s Notes & Highlights

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

by Martin Kleppmann

Read between March 20 - June 11, 2019

33%

Unfortunately, object-relational mapping frameworks make it easy to accidentally write code that performs unsafe read-modify-write cycles instead of using atomic operations provided by the database [38]. That’s not a problem if you know what you are doing, but it is potentially a source of subtle bugs that are difficult to find by testing.

34%

Atomic operations can work well in a replicated context, especially if they are commutative (i.e., you can apply them in a different order on different replicas, and still get the same result).

35%

In order to make the most of that single thread, transactions need to be structured differently from their traditional form.

35%

However, for any transaction that needs to access multiple partitions, the database must coordinate the transaction across all the partitions that it touches.

35%

Serial execution of transactions has become a viable way of achieving serializable isolation within certain constraints:

35%

In 2PL, writers don’t just block other writers; they also block readers and vice versa.

38%

Working with distributed systems is fundamentally different from writing software on a single computer — and the main difference is that there are lots of new and exciting ways for things to go wrong

38%

This chapter is a thoroughly pessimistic and depressing overview of things that may go wrong in a distributed system.

38%

Although the system can be more reliable than its underlying parts, there is always a limit to how much more reliable it can be.

39%

In particular, it could happen that the node actually wasn’t dead but only slow to respond due to overload; transferring its load to other nodes can cause a cascading failure (in the extreme case, all nodes declare each other dead, and everything stops working).

41%

Spanner implements snapshot isolation across datacenters in this way [59, 60]. It uses the clock’s confidence interval as reported by the TrueTime API, and is based on the following observation:

41%

Using clock synchronization for distributed transaction semantics is an area of active research [57, 61, 62]. These ideas are interesting, but they have not yet been implemented in mainstream databases outside of Google.

42%

This trick hides GC pauses from clients and reduces the high percentiles of response time [70, 71]. Some latency-sensitive financial trading systems [72] use this approach.

42%

Fortunately, we don’t need to go as far as figuring out the meaning of life. In a distributed system, we can state the assumptions we are making about the behavior (the system model) and design the actual system in such a way that it meets those assumptions. Algorithms can be proved to function correctly within a certain system model. This means that reliable behavior is achievable, even if the underlying system model provides very few guarantees.

42%

The moral of these stories is that a node cannot necessarily trust its own judgment of a situation. A distributed system cannot exclusively rely on a single node, because a node may fail at any time, potentially leaving the system stuck and unable to recover. Instead, many distributed algorithms rely on a quorum, that is, voting among the nodes (see “Quorums for reading and writing”): decisions require some minimum number of votes from several nodes in order to reduce the dependence on any one particular node.

42%

Distributed systems problems become much harder if there is a risk that nodes may “lie” (send arbitrary faulty or corrupted responses)

43%

However, a real implementation may still have to include code to handle the case where something happens that was assumed to be impossible, even if that handling boils down to printf("Sucks to be you") and exit(666) — i.e., letting a human operator clean up the mess [93]. (This is arguably the difference between computer science and software engineering.)

43%

Theoretical analysis and empirical testing are equally important.

45%

We need to understand the scope of what can and cannot be done: in some situations, it’s possible for the system to tolerate faults and continue working; in other situations, that is not possible.

45%

A better name for eventual consistency may be convergence, as we expect all replicas to eventually converge to the same value

« Prev 1 2 3 4 5 Next »

See a Problem?

Preview — Designing Data-Intensive Applications by Martin Kleppmann