Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Rate it:
Open Preview
Kindle Notes & Highlights
9%
Flag icon
Instead of writing a triple as (subject, predicate, object), we write it as predicate(subject, object).
9%
Flag icon
Datalog is a subset of Prolog,
9%
Flag icon
less convenient for simple one-off queries, but it can cope better if your data is complex.
9%
Flag icon
One thing that document and graph databases have in common is that they typically don’t enforce a schema
10%
Flag icon
researchers have written specialized genome database software like GenBank [
10%
Flag icon
there is a big difference between storage engines that are optimized for transactional workloads and those that are optimized for analytics.
10%
Flag icon
In this book, log is used in the more general sense: an append-only sequence of records.
11%
Flag icon
An index is an additional structure that is derived from the primary data. Many databases allow you to add and remove indexes, and this doesn’t affect the contents of the database; it only affects the performance of queries.
11%
Flag icon
Any kind of index usually slows down writes, because the index also needs to be updated every time data is written.
11%
Flag icon
Compaction means throwing away duplicate keys in the log, and keeping only the most recent update for each key.
11%
Flag icon
it is difficult to make an on-disk hash map perform well. It requires a lot of random access I/O, it is expensive to grow when it becomes full, and hash collisions require fiddly logic
11%
Flag icon
Storage engines that are based on this principle of merging and compacting sorted files are often called LSM storage engines.
11%
Flag icon
the LSM-tree algorithm can be slow when looking up keys that do not exist in the database: you have to check the memtable, then the segments all the way back to the oldest (possibly having to read from disk for each one)
11%
Flag icon
(A Bloom filter is a memory-efficient data structure for approximating the contents of a set. It can tell you if a key does not appear in the database, and thus saves many unnecessary disk reads for nonexistent keys.)
12%
Flag icon
In size-tiered compaction, newer and smaller SSTables are successively merged into older and larger SSTables.
12%
Flag icon
In leveled compaction, the key range is split up into smaller SSTables and older data is moved into separate “levels,” which allows the compaction to proceed more incrementally and use less disk space.
12%
Flag icon
B-trees break the database down into fixed-size blocks or pages, traditionally 4 KB in size (sometimes bigger), and read or write one page at a time.
12%
Flag icon
The number of references to child pages in one page of the B-tree is called the branching factor.
12%
Flag icon
a B-tree with n keys always has a depth of O(log n).
12%
Flag icon
it is common for B-tree implementations to include an additional data structure on disk: a write-ahead log (WAL, also known as a redo log).
12%
Flag icon
each leaf page may have references to its sibling pages to the left and right, which allows scanning keys in order without jumping back to parent pages.
12%
Flag icon
LSM-trees are typically faster for writes, whereas B-trees are thought to be faster for reads
12%
Flag icon
one write to the database resulting in multiple writes to the disk over the course of the database’s lifetime — is known as write amplification.
12%
Flag icon
LSM-trees are typically able to sustain higher write throughput than B-trees, partly because they sometimes have lower write amplification
12%
Flag icon
Since LSM-trees are not page-oriented and periodically rewrite SSTables to remove fragmentation, they have lower storage overheads,
12%
Flag icon
On many SSDs, the firmware internally uses a log-structured algorithm to turn random writes into sequential writes on the underlying storage chips, so the impact of the storage engine’s write pattern is less pronounced
12%
Flag icon
A downside of log-structured storage is that the compaction process can sometimes interfere with the performance of ongoing reads and writes.
12%
Flag icon
Another issue with compaction arises at high write throughput: the disk’s finite write bandwidth needs to be shared between the initial write (logging and flushing a memtable to disk) and the compaction threads running in the background.
12%
Flag icon
it can happen that compaction cannot keep up with the rate of incoming writes. In this case, the number of unmerged segments on disk keeps growing until you run out of disk space, and reads also slow down because they need to check more segment files. Typically, SSTable-based storage engines do not throttle the rate of incoming writes, even if compaction cannot keep up, so you need explicit monitoring to detect this situation
12%
Flag icon
secondary index can easily be constructed from a key-value index. The main difference is that in a secondary index, the indexed values are not necessarily unique;
12%
Flag icon
the place where rows are stored is known as a heap file, and it stores data in no particular order
12%
Flag icon
heap file approach is common because it avoids duplicating data when multiple secondary indexes are present:
12%
Flag icon
In some situations, the extra hop from the index to the heap file is too much of a performance penalty for reads, so it can be desirable to store the indexed row directly within an index. This is known as a clustered index.
13%
Flag icon
compromise between a clustered index (storing all row data within the index) and a nonclustered index (storing only references to the data within the index) is known as a covering index or index with included columns, which stores some of a table’s columns within the index [33].
13%
Flag icon
As with any kind of duplication of data, clustered and covering indexes can speed up reads, but they require additional storage and can add overhead on writes.
13%
Flag icon
Some in-memory key-value stores, such as Memcached, are intended for caching use only, where it’s acceptable for data to be lost if a machine is restarted. But other in-memory databases aim for durability, which can be achieved with special hardware (such as battery-powered RAM), by writing a log of changes to disk, by writing periodic snapshots to disk, or by replicating the in-memory state to other machines.
13%
Flag icon
Redis offers a database-like interface to various data structures such as priority queues and sets. Because it keeps all data in memory, its implementation is comparatively simple.
13%
Flag icon
anti-caching approach works by evicting the least recently used data from memory to disk when there is not enough memory,
21%
Flag icon
Whenever the leader writes new data to its local storage, it also sends the data change to all of its followers as part of a replication log or change stream.
21%
Flag icon
When a client wants to read from the database, it can query either the leader or any of the followers.
21%
Flag icon
if you enable synchronous replication on a database, it usually means that one of the followers is synchronous, and the others are asynchronous. If the synchronous follower becomes unavailable or slow, one of the asynchronous followers is made synchronous.
21%
Flag icon
a fully asynchronous configuration has the advantage that the leader can continue processing writes, even if all of its followers have fallen behind.
21%
Flag icon
follower connects to the leader and requests all the data changes that have happened since the snapshot was taken. This requires that the snapshot is associated with an exact position in the leader’s replication log. That position has various names: for example, PostgreSQL calls it the log sequence number, and MySQL calls it the binlog coordinates.
22%
Flag icon
each follower keeps a log of the data changes it has received from the leader.
22%
Flag icon
best candidate for leadership is usually the replica with the most up-to-date data changes from the old leader (to minimize any data loss).
22%
Flag icon
The system needs to ensure that the old leader becomes a follower and recognizes the new leader.
22%
Flag icon
The new leader may have received conflicting writes in the meantime. The most common solution is for the old leader’s unreplicated writes to simply be discarded, which may violate clients’ durability expectations.
22%
Flag icon
could happen that two nodes both believe that they are the leader. This situation is called split brain, and it is dangerous: if both leaders accept writes, and there is no process for resolving conflicts (see “Multi-Leader Replication”), data is likely to be lost or corrupted. As a safety catch, some systems have a mechanism to shut down one node if two leaders are detected.
22%
Flag icon
Any statement that calls a nondeterministic function, such as NOW() to get the current date and time or RAND() to get a random number, is likely to generate a different value on each replica.
22%
Flag icon
the leader can replace any nondeterministic function calls with a fixed return value when the statement is logged so that the followers all get the same value.