Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Rate it:
Open Preview
0%
Flag icon
We call an application data-intensive if data is its primary challenge — the quantity of data, the complexity of data, or the speed at which it is changing — as opposed to compute-intensive, where CPU cycles are the bottleneck.
1%
Flag icon
Maintainability Over time, many different people will work on the system (engineering and operations, both maintaining current behavior and adapting the system to new use cases), and they should all be able to work on it productively. See “Maintainability”
Diwakar Moturu liked this
5%
Flag icon
The one-to-many relationships from the user profile to the user’s positions, educational history, and contact information imply a tree structure in the data, and the JSON representation makes this tree structure explicit (see Figure 2-2).
Joe Soltzberg
One-To-Many => NoSQL (often)
6%
Flag icon
Document databases are sometimes called schemaless, but that’s misleading, as the code that reads the data usually assumes some kind of structure — i.e., there is an implicit schema, but it is not enforced by the database [20]. A more accurate term is schema-on-read (the structure of the data is implicit, and only interpreted when the data is read), in contrast with schema-on-write (the traditional approach of relational databases, where the schema is explicit and the database ensures all written data conforms to it)
6%
Flag icon
A declarative query language is attractive because it is typically more concise and easier to work with than an imperative API. But more importantly, it also hides implementation details of the database engine, which makes it possible for the database system to introduce performance improvements without requiring any changes to queries.
6%
Flag icon
Finally, declarative languages often lend themselves to parallel execution.
7%
Flag icon
The moral of the story is that a NoSQL system may find itself accidentally reinventing SQL, albeit in disguise.
Brian liked this
8%
Flag icon
Graphs are good for evolvability: as you add features to your application, a graph can easily be extended to accommodate changes in your application’s data structures.
11%
Flag icon
Range queries are not efficient.
Joe Soltzberg
Why hasn't this whole chapter been figured out mathematically where all you have to do is give a mathematical statement of your constraints and then use a theorem to pick the best db?
Brian liked this
12%
Flag icon
You need to test systems with your particular workload in order to make a valid comparison.
16%
Flag icon
In order to restore data in the same object types, the decoding process needs to be able to instantiate arbitrary classes. This is frequently a source of security problems
17%
Flag icon
Therefore, to maintain backward compatibility, every field you add after the initial deployment of the schema must be optional or have a default value.
22%
Flag icon
Handling a failure of the leader is trickier: one of the followers needs to be promoted to be the new leader, clients need to be reconfigured to send their writes to the new leader, and the other followers need to start consuming data changes from the new leader. This process is called failover.
22%
Flag icon
The most common solution is for the old leader’s unreplicated writes to simply be discarded, which may violate clients’ durability expectations.
22%
Flag icon
The main disadvantage is that the log describes the data on a very low level: a WAL contains details of which bytes were changed in which disk blocks. This makes replication closely coupled to the storage engine. If the database changes its storage format from one version to another, it is typically not possible to run different versions of the database software on the leader and the followers.
22%
Flag icon
The term “eventually” is deliberately vague: in general, there is no limit to how far a replica can fall behind. In normal operation, the delay between a write happening on the leader and being reflected on a follower — the replication lag — may be only a fraction of a second, and not noticeable in practice. However, if the system is operating near capacity or if there is a problem in the network, the lag can easily increase to several seconds or even minutes.
23%
Flag icon
Thus, a simple rule is: always read the user’s own profile from the leader, and any other users’ profiles from a follower.
23%
Flag icon
The client can remember the timestamp of its most recent write — then the system can ensure that the replica serving any reads for that user reflects updates at least until that timestamp.
23%
Flag icon
is to make sure that each user always makes their reads from the same replica (different users can read from different replicas). For example, the replica can be chosen based on a hash of the user ID, rather than randomly. However, if that replica fails, the user’s queries will need to be rerouted to another replica.
Joe Soltzberg
Easy with dynamodb where hash key is user id
23%
Flag icon
One solution is to make sure that any writes that are causally related to each other are written to the same partition — but in some applications that cannot be done efficiently. There are also algorithms that explicitly keep track of causal dependencies, a topic that we will return to in “The “happens-before” relationship and concurrency”.
23%
Flag icon
This is why transactions exist: they are a way for a database to provide stronger guarantees so that the application can be simpler.
23%
Flag icon
Leader-based replication has one major downside: there is only one leader, and all writes must go through it.iv If you can’t connect to the leader for any reason, for example due to a network interruption between you and the leader, you can’t write to the database.
23%
Flag icon
For this reason, multi-leader replication is often considered dangerous territory that should be avoided if possible [28].
23%
Flag icon
In this case, every device has a local database that acts as a leader (it accepts write requests), and there is an asynchronous multi-leader replication process (sync) between the replicas of your calendar on all of your devices. The replication lag may be hours or even days, depending on when you have internet access available.
Joe Soltzberg
Interesting way to look at the problem...
23%
Flag icon
We don’t usually think of collaborative editing as a database replication problem, but it has a lot in common with the previously mentioned offline editing use case.
24%
Flag icon
If you want synchronous conflict detection, you might as well just use single-leader replication.
28%
Flag icon
Range scans are very useful in this case, because they let you easily fetch, say, all the readings from a particular month.
Joe Soltzberg
Useful for a free built-in prefix search system...
28%
Flag icon
Because of this risk of skew and hot spots, many distributed datastores use a hash function to determine the partition for a given key.
28%
Flag icon
for example, in Java’s Object.hashCode() and Ruby’s Object#hash, the same key may have a different hash value in different processes [6].
Joe Soltzberg
Google to see the cool reason why
28%
Flag icon
As discussed, hashing a key to determine its partition can help reduce hot spots. However, it can’t avoid them entirely: in the extreme case where all reads and writes are for the same key, you still end up with all requests being routed to the same partition.
28%
Flag icon
additional work, as they have to read the data from all 100 keys and combine it. This technique also requires additional bookkeeping: it only makes sense to append the random number for the small number of hot keys; for the vast majority of keys with low write throughput this would be unnecessary overhead. Thus, you also need some way of keeping track of which keys are being split.
Joe Soltzberg
Asking for trouble 99% of the time
29%
Flag icon
Send all requests from clients to a routing tier first, which determines the node that should handle each request and forwards it accordingly.
31%
Flag icon
The idea of ACID consistency is that you have certain statements about your data (invariants) that must always be true — for example, in an accounting system, credits and debits across all accounts must always be balanced. If a transaction starts with a database that is valid according to these invariants, and any writes during the transaction preserve the validity, then you can be sure that the invariants are always satisfied.
31%
Flag icon
Atomicity, isolation, and durability are properties of the database, whereas consistency (in the ACID sense) is a property of the application. The application may rely on the database’s atomicity and isolation properties in order to achieve consistency, but it’s not up to the database alone. Thus, the letter C doesn’t really belong in ACID.
31%
Flag icon
In practice, there is no one technique that can provide absolute guarantees. There are only various risk-reduction techniques, including writing to disk, replicating to remote machines, and backups — and they can and should be used together. As always, it’s wise to take any theoretical “guarantees” with a healthy grain of salt.
35%
Flag icon
Serializable isolation is usually regarded as the strongest isolation level. It guarantees that even though transactions may execute in parallel, the end result is the same as if they had executed one at a time, serially, without any concurrency.
39%
Flag icon
Moreover, TCP considers a packet to be lost if it is not acknowledged within some timeout (which is calculated from observed round-trip times), and lost packets are automatically retransmitted. Although the application does not see the packet loss and retransmission, it does see the resulting delay (waiting for the timeout to expire, and then waiting for the retransmitted packet to be acknowledged).
Joe Soltzberg
Data centers probably have a better protocol than TCP? What?
Joe Soltzberg
· Flag
Joe Soltzberg
Going off of TCP can be bad. Datacenters like to use generic tools/protocols, which are easy/normal to work with. Added complexity can cost you a lot in the long run
39%
Flag icon
Why do datacenter networks and the internet use packet switching? The answer is that they are optimized for bursty traffic. A circuit is good for an audio or video call, which needs to transfer a fairly constant number of bits per second for the duration of the call. On the other hand, requesting a web page, sending an email, or transferring a file doesn’t have any particular bandwidth requirement — we just want it to complete as quickly as possible.
39%
Flag icon
Thus, using circuits for bursty data transfers wastes network capacity and makes transfers unnecessarily slow. By contrast, TCP dynamically adapts the rate of data transfer to the available network capacity.
42%
Flag icon
The moral of these stories is that a node cannot necessarily trust its own judgment of a situation. A distributed system cannot exclusively rely on a single node, because a node may fail at any time, potentially leaving the system stuck and unable to recover. Instead, many distributed algorithms rely on a quorum, that is, voting among the nodes (see “Quorums for reading and writing”): decisions require some minimum number of votes from several nodes in order to reduce the dependence on any one particular node.
46%
Flag icon
Joe Soltzberg
How? There’s a combinatorial number of possibilities?
46%
Flag icon
constraints all require there to be a single up-to-date value (the account balance, the stock level, the seat occupancy) that all nodes agree on.
46%
Flag icon
In real applications, it is sometimes acceptable to treat such constraints loosely (for example, if a flight is overbooked, you can move customers to a different flight and offer them compensation for the inconvenience).
Joe Soltzberg
False.
47%
Flag icon
The CAP theorem
Joe Soltzberg
Misused by mongodb
47%
Flag icon
linearizability implies causality:
48%
Flag icon
The use of Lamport timestamps is illustrated in Figure 9-8. Each node has a unique identifier, and each node keeps a counter of the number of operations it has processed. The Lamport timestamp is then simply a pair of (counter, node ID). Two nodes may sometimes have the same counter value, but by including the node ID in the timestamp, each timestamp is made unique.
Joe Soltzberg
So they're like semaphores?
48%
Flag icon
The key idea about Lamport timestamps, which makes them consistent with causality, is the following: every node and every client keeps track of the maximum counter value it has seen so far, and includes that maximum on every request. When a node receives a request or response with a maximum counter value greater than its own counter value, it immediately increases its own counter to that maximum.
48%
Flag icon
Although Lamport timestamps define a total order of operations that is consistent with causality, they are not quite sufficient to solve many common problems in distributed systems. For example, consider a system that needs to ensure that a username uniquely identifies a user account.
48%
Flag icon
Total order broadcast is usually described as a protocol for exchanging messages between nodes. Informally, it requires that two safety properties always be satisfied: Reliable delivery No messages are lost: if a message is delivered to one node, it is delivered to all nodes. Totally ordered delivery Messages are delivered to every node in the same order.
49%
Flag icon
You can implement such a linearizable compare-and-set operation as follows by using total order broadcast as an append-only log [62, 63]: Append a message to the log, tentatively indicating the username you want to claim. Read the log, and wait for the message you appended to be delivered back to you.xi Check for any messages claiming the username that you want. If the first message for your desired username is your own message, then you are successful: you can commit the username claim (perhaps by appending another message to the log) and acknowledge it to the client. If the first message for ...more
« Prev 1