More on this book
Community
Kindle Notes & Highlights
Read between
October 21 - November 26, 2024
With Avro, forward compatibility means that you can have a new version of the schema as writer and an old version of the schema as reader.
Conversely, backward compatibility means that you can have a new version of the schema as reader and an old version as writer.
To maintain compatibility, you may only add or remove a field that...
This highlight has been truncated due to consecutive passage length restrictions.
You can only use null as a default value if it is one of the branches of the union.
Changing the datatype of a field is possible, provided that Avro can convert the type.
The simplest solution is to include a version number at the beginning of every encoded record, and to keep a list of schema versions in your database.
The column name in the database maps to the field name in Avro.
Compatibility is a relationship between one process that encodes the data, and another process that decodes it.
In a database, the process that writes to the database encodes the data, and the process that reads from the database decodes it.
The servers expose an API over the network, and the clients can connect to the servers to make requests to that API.
The RPC model tries to make a request to a remote network service look the same as calling a function or method in your programming language, within the same process (this abstraction is called location transparency).
A local function call is predictable and either succeeds or fails, depending only on parameters that are under your control.
If you retry a failed network request, it could happen that the previous request actually got through, and only the response was lost.
Every time you call a local function, it normally takes about the same time to execute.
Custom RPC protocols with a binary encoding format can achieve better performance than something generic like JSON over REST.
The main focus of RPC frameworks is on requests between services owned by the same organization, typically within the same datacenter.
There can be many producers and many consumers on the same topic.
The actor model is a programming model for concurrency in a single process.
Each actor typically represents one client or entity, it may have some local state (which is not shared with any other actor), and it communicates with other actors by sending and receiving asynchronous messages.
Location transparency works better in the actor model than in RPC, because the actor model already assumes that messages may be lost, even within a single process.
Rolling upgrades allow new versions of a service to be released without downtime (thus encouraging frequent small releases over rare big releases) and make deployments less risky (allowing faulty releases to be detected and rolled back before they affect a large number of users).
SOAP is a particular technology, whereas SOA is a general approach to building systems.
For a successful technology, reality must take precedence over public relations, for nature cannot be fooled.
Any coordination between nodes is done at the software level, using a conventional network.
While a distributed shared-nothing architecture has many advantages, it usually also incurs additional complexity for applications and sometimes limits the expressiveness of the data models you can use.
The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair.
Replication means keeping a copy of the same data on multiple machines that are connected via a network.
In this chapter we will assume that your dataset is so small that each machine can hold a copy of the entire dataset.
Mainstream use of distributed databases is more recent.
Each node that stores a copy of the database is called a replica.
When clients want to write to the database, they must send their requests to the leader, which first writes the new data to its local storage.
Whenever the leader writes new data to its local storage, it also sends the data change to all of its followers as part of a replication log or change stream.
If the synchronous follower becomes unavailable or slow, one of the asynchronous followers is made synchronous.
How do you ensure that the new follower has an accurate copy of the leader’s data?
The follower connects to the leader and requests all the data changes that have happened since the snapshot was taken.
How do you achieve high availability with leader-based replication?
On its local disk, each follower keeps a log of the data changes it has received from the leader.
If asynchronous replication is used, the new leader may not have received all the writes from the old leader before it failed.
What is the right timeout before the leader is declared dead?
A longer timeout means a longer time to recovery in the case where the leader fails.
Any statement that calls a nondeterministic function, such as NOW() to get the current date and time or RAND() to get a random number, is likely to generate a different value on each replica.
If the database changes its storage format from one version to another, it is typically not possible to run different versions of the database software on the leader and the followers.
A trigger lets you register custom application code that is automatically executed when a data change (write transaction) occurs in a database system.
Leader-based replication requires all writes to go through a single node, but read-only queries can go to any replica.
Unfortunately, if an application reads from an asynchronous follower, it may see outdated information if the follower has fallen behind.
Many applications let the user submit some data and then view what they have submitted.
When new data is submitted, it must be sent to the leader, but when the user views the data, it can be read from a follower.
It’s a lesser guarantee than strong consistency, but a stronger guarantee than eventual consistency.
This guarantee says that if a sequence of writes happens in a certain order, then anyone reading those writes will see them appear in the same order.