Tali’s Kindle Notes & Highlights for Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Rate it:

Open Preview

More on this book

Community

Sparsh Priyadarshi

1 note & 1 highlight

Jefersson Nathan

11 notes & 11 highlights

Charles Fonseca

4 notes & 524 highlights

Ucchishta Sivaguru

9 notes & 20 highlights

Sugan

1 note & 44 highlights

Guzman Monne

28 notes & 34 highlights

Dong

2 notes & 26 highlights

Mohamed Elsherif

5 notes & 17 highlights

Chena Lee

6 notes & 1353 highlights

Joe Soltzberg

20 notes & 75 highlights

Corey

6 notes & 10 highlights

Dinesh Singh

2 notes & 11 highlights

Robert Gustavo

38 notes & 38 highlights

Cezar Castro rosa

Nikhil Goyal

Vladimir

Ion Gritco

Keith Sader

Guilherme Camargo

Vipin Ajayakumar

Jason

Alexis

Ory

Faisal Morensya

Muhaimen Ezabbad

Frederico Cabral

Ian Dunn

Antonio Bustamante

Asif Hoda

zhouqiang

Nick Fahrenkrog

Matt Chamlee

Atthavit Wannasakwong

Xuan Lin

Eric Chong

Dallin Coons

Di Fan

Prakash Srivastava

Denis

Kindle Notes & Highlights

by Tali

See all Tali’s Notes & Highlights

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

by Martin Kleppmann

Read between April 5 - December 1, 2020

Instead of writing a triple as (subject, predicate, object), we write it as predicate(subject, object).

Datalog is a subset of Prolog,

less convenient for simple one-off queries, but it can cope better if your data is complex.

One thing that document and graph databases have in common is that they typically don’t enforce a schema

10%

researchers have written specialized genome database software like GenBank [

10%

there is a big difference between storage engines that are optimized for transactional workloads and those that are optimized for analytics.

10%

In this book, log is used in the more general sense: an append-only sequence of records.

11%

An index is an additional structure that is derived from the primary data. Many databases allow you to add and remove indexes, and this doesn’t affect the contents of the database; it only affects the performance of queries.

11%

Any kind of index usually slows down writes, because the index also needs to be updated every time data is written.

11%

Compaction means throwing away duplicate keys in the log, and keeping only the most recent update for each key.

11%

it is difficult to make an on-disk hash map perform well. It requires a lot of random access I/O, it is expensive to grow when it becomes full, and hash collisions require fiddly logic

11%

Storage engines that are based on this principle of merging and compacting sorted files are often called LSM storage engines.

11%

the LSM-tree algorithm can be slow when looking up keys that do not exist in the database: you have to check the memtable, then the segments all the way back to the oldest (possibly having to read from disk for each one)

11%

(A Bloom filter is a memory-efficient data structure for approximating the contents of a set. It can tell you if a key does not appear in the database, and thus saves many unnecessary disk reads for nonexistent keys.)

12%

In size-tiered compaction, newer and smaller SSTables are successively merged into older and larger SSTables.

12%

In leveled compaction, the key range is split up into smaller SSTables and older data is moved into separate “levels,” which allows the compaction to proceed more incrementally and use less disk space.

12%

B-trees break the database down into fixed-size blocks or pages, traditionally 4 KB in size (sometimes bigger), and read or write one page at a time.

12%

The number of references to child pages in one page of the B-tree is called the branching factor.

12%

a B-tree with n keys always has a depth of O(log n).

12%

it is common for B-tree implementations to include an additional data structure on disk: a write-ahead log (WAL, also known as a redo log).

12%

each leaf page may have references to its sibling pages to the left and right, which allows scanning keys in order without jumping back to parent pages.

12%

LSM-trees are typically faster for writes, whereas B-trees are thought to be faster for reads

12%

one write to the database resulting in multiple writes to the disk over the course of the database’s lifetime — is known as write amplification.

12%

LSM-trees are typically able to sustain higher write throughput than B-trees, partly because they sometimes have lower write amplification

12%

Since LSM-trees are not page-oriented and periodically rewrite SSTables to remove fragmentation, they have lower storage overheads,

12%

On many SSDs, the firmware internally uses a log-structured algorithm to turn random writes into sequential writes on the underlying storage chips, so the impact of the storage engine’s write pattern is less pronounced

12%

A downside of log-structured storage is that the compaction process can sometimes interfere with the performance of ongoing reads and writes.

12%

Another issue with compaction arises at high write throughput: the disk’s finite write bandwidth needs to be shared between the initial write (logging and flushing a memtable to disk) and the compaction threads running in the background.

12%

it can happen that compaction cannot keep up with the rate of incoming writes. In this case, the number of unmerged segments on disk keeps growing until you run out of disk space, and reads also slow down because they need to check more segment files. Typically, SSTable-based storage engines do not throttle the rate of incoming writes, even if compaction cannot keep up, so you need explicit monitoring to detect this situation

12%

secondary index can easily be constructed from a key-value index. The main difference is that in a secondary index, the indexed values are not necessarily unique;

12%

the place where rows are stored is known as a heap file, and it stores data in no particular order

12%

heap file approach is common because it avoids duplicating data when multiple secondary indexes are present:

12%

In some situations, the extra hop from the index to the heap file is too much of a performance penalty for reads, so it can be desirable to store the indexed row directly within an index. This is known as a clustered index.

13%

compromise between a clustered index (storing all row data within the index) and a nonclustered index (storing only references to the data within the index) is known as a covering index or index with included columns, which stores some of a table’s columns within the index [33].

13%

As with any kind of duplication of data, clustered and covering indexes can speed up reads, but they require additional storage and can add overhead on writes.

13%

Some in-memory key-value stores, such as Memcached, are intended for caching use only, where it’s acceptable for data to be lost if a machine is restarted. But other in-memory databases aim for durability, which can be achieved with special hardware (such as battery-powered RAM), by writing a log of changes to disk, by writing periodic snapshots to disk, or by replicating the in-memory state to other machines.

13%

Redis offers a database-like interface to various data structures such as priority queues and sets. Because it keeps all data in memory, its implementation is comparatively simple.

13%

anti-caching approach works by evicting the least recently used data from memory to disk when there is not enough memory,

21%

Whenever the leader writes new data to its local storage, it also sends the data change to all of its followers as part of a replication log or change stream.

21%

When a client wants to read from the database, it can query either the leader or any of the followers.

21%

if you enable synchronous replication on a database, it usually means that one of the followers is synchronous, and the others are asynchronous. If the synchronous follower becomes unavailable or slow, one of the asynchronous followers is made synchronous.

21%

a fully asynchronous configuration has the advantage that the leader can continue processing writes, even if all of its followers have fallen behind.

21%

follower connects to the leader and requests all the data changes that have happened since the snapshot was taken. This requires that the snapshot is associated with an exact position in the leader’s replication log. That position has various names: for example, PostgreSQL calls it the log sequence number, and MySQL calls it the binlog coordinates.

22%

each follower keeps a log of the data changes it has received from the leader.

22%

best candidate for leadership is usually the replica with the most up-to-date data changes from the old leader (to minimize any data loss).

22%

The system needs to ensure that the old leader becomes a follower and recognizes the new leader.

22%

The new leader may have received conflicting writes in the meantime. The most common solution is for the old leader’s unreplicated writes to simply be discarded, which may violate clients’ durability expectations.

22%

could happen that two nodes both believe that they are the leader. This situation is called split brain, and it is dangerous: if both leaders accept writes, and there is no process for resolving conflicts (see “Multi-Leader Replication”), data is likely to be lost or corrupted. As a safety catch, some systems have a mechanism to shut down one node if two leaders are detected.

22%

Any statement that calls a nondeterministic function, such as NOW() to get the current date and time or RAND() to get a random number, is likely to generate a different value on each replica.

22%

the leader can replace any nondeterministic function calls with a fixed return value when the statement is logged so that the followers all get the same value.

« Prev 1 2 3 … 6 Next »

See a Problem?

Preview — Designing Data-Intensive Applications by Martin Kleppmann