Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Rate it:
Open Preview
18%
Flag icon
With Avro, forward compatibility means that you can have a new version of the schema as writer and an old version of the schema as reader.
18%
Flag icon
Conversely, backward compatibility means that you can have a new version of the schema as reader and an old version as writer.
18%
Flag icon
To maintain compatibility, you may only add or remove a field that...
This highlight has been truncated due to consecutive passage length restrictions.
18%
Flag icon
You can only use null as a default value if it is one of the branches of the union.
18%
Flag icon
Changing the datatype of a field is possible, provided that Avro can convert the type.
18%
Flag icon
The simplest solution is to include a version number at the beginning of every encoded record, and to keep a list of schema versions in your database.
18%
Flag icon
The column name in the database maps to the field name in Avro.
18%
Flag icon
Compatibility is a relationship between one process that encodes the data, and another process that decodes it.
18%
Flag icon
In a database, the process that writes to the database encodes the data, and the process that reads from the database decodes it.
19%
Flag icon
The servers expose an API over the network, and the clients can connect to the servers to make requests to that API.
19%
Flag icon
The RPC model tries to make a request to a remote network service look the same as calling a function or method in your programming language, within the same process (this abstraction is called location transparency).
19%
Flag icon
A local function call is predictable and either succeeds or fails, depending only on parameters that are under your control.
19%
Flag icon
If you retry a failed network request, it could happen that the previous request actually got through, and only the response was lost.
19%
Flag icon
Every time you call a local function, it normally takes about the same time to execute.
19%
Flag icon
Custom RPC protocols with a binary encoding format can achieve better performance than something generic like JSON over REST.
19%
Flag icon
The main focus of RPC frameworks is on requests between services owned by the same organization, typically within the same datacenter.
20%
Flag icon
There can be many producers and many consumers on the same topic.
20%
Flag icon
The actor model is a programming model for concurrency in a single process.
20%
Flag icon
Each actor typically represents one client or entity, it may have some local state (which is not shared with any other actor), and it communicates with other actors by sending and receiving asynchronous messages.
20%
Flag icon
Location transparency works better in the actor model than in RPC, because the actor model already assumes that messages may be lost, even within a single process.
20%
Flag icon
Rolling upgrades allow new versions of a service to be released without downtime (thus encouraging frequent small releases over rare big releases) and make deployments less risky (allowing faulty releases to be detected and rolled back before they affect a large number of users).
20%
Flag icon
SOAP is a particular technology, whereas SOA is a general approach to building systems.
21%
Flag icon
For a successful technology, reality must take precedence over public relations, for nature cannot be fooled.
21%
Flag icon
Any coordination between nodes is done at the software level, using a conventional network.
21%
Flag icon
While a distributed shared-nothing architecture has many advantages, it usually also incurs additional complexity for applications and sometimes limits the expressiveness of the data models you can use.
21%
Flag icon
The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair.
21%
Flag icon
Replication means keeping a copy of the same data on multiple machines that are connected via a network.
21%
Flag icon
In this chapter we will assume that your dataset is so small that each machine can hold a copy of the entire dataset.
21%
Flag icon
Mainstream use of distributed databases is more recent.
21%
Flag icon
Each node that stores a copy of the database is called a replica.
21%
Flag icon
When clients want to write to the database, they must send their requests to the leader, which first writes the new data to its local storage.
21%
Flag icon
Whenever the leader writes new data to its local storage, it also sends the data change to all of its followers as part of a replication log or change stream.
21%
Flag icon
22%
Flag icon
If the synchronous follower becomes unavailable or slow, one of the asynchronous followers is made synchronous.
22%
Flag icon
How do you ensure that the new follower has an accurate copy of the leader’s data?
22%
Flag icon
The follower connects to the leader and requests all the data changes that have happened since the snapshot was taken.
22%
Flag icon
How do you achieve high availability with leader-based replication?
22%
Flag icon
On its local disk, each follower keeps a log of the data changes it has received from the leader.
22%
Flag icon
If asynchronous replication is used, the new leader may not have received all the writes from the old leader before it failed.
22%
Flag icon
What is the right timeout before the leader is declared dead?
22%
Flag icon
A longer timeout means a longer time to recovery in the case where the leader fails.
22%
Flag icon
Any statement that calls a nondeterministic function, such as NOW() to get the current date and time or RAND() to get a random number, is likely to generate a different value on each replica.
23%
Flag icon
If the database changes its storage format from one version to another, it is typically not possible to run different versions of the database software on the leader and the followers.
23%
Flag icon
A trigger lets you register custom application code that is automatically executed when a data change (write transaction) occurs in a database system.
23%
Flag icon
Leader-based replication requires all writes to go through a single node, but read-only queries can go to any replica.
23%
Flag icon
Unfortunately, if an application reads from an asynchronous follower, it may see outdated information if the follower has fallen behind.
23%
Flag icon
Many applications let the user submit some data and then view what they have submitted.
23%
Flag icon
When new data is submitted, it must be sent to the leader, but when the user views the data, it can be read from a follower.
23%
Flag icon
It’s a lesser guarantee than strong consistency, but a stronger guarantee than eventual consistency.
23%
Flag icon
This guarantee says that if a sequence of writes happens in a certain order, then anyone reading those writes will see them appear in the same order.