More on this book
Community
Kindle Notes & Highlights
Read between
April 5 - December 1, 2020
MySQL now switches to row-based replication (discussed shortly) if there is any nondeterminism in a statement.
a WAL contains details of which bytes were changed in which disk blocks. This makes replication closely coupled to the storage engine.
If the replication protocol allows the follower to use a newer software version than the leader, you can perform a zero-downtime upgrade of the database software by first upgrading the followers and then performing a failover to make one of the upgraded nodes the new leader.
Since a logical log is decoupled from the storage engine internals, it can more easily be kept backward compatible,
Leader-based replication requires all writes to go through a single node, but read-only queries can go to any replica.
In this read-scaling architecture, you can increase the capacity for serving read-only requests simply by adding more followers.
read-after-write consistency, also known as read-your-writes consistency [24]. This is a guarantee that if the user reloads the page, they will always see any updates they submitted themselves. It makes no promises about other users:
monotonic reads only means that if one user makes several reads in sequence, they will not see time go backward — i.e., they will not read older data after having previously read newer data.
make sure that each user always makes their reads from the same replica
consistent prefix reads [23]. This guarantee says that if a sequence of writes happens in a certain order, then anyone reading those writes will see them appear in the same order.
make sure that any writes that are causally related to each other are written to the same partition
A natural extension of the leader-based replication model is to allow more than one node to accept writes.
(also known as master–master or active/active replication).
In a multi-leader configuration, you can have a leader in each datacenter.
place. In a multi-leader configuration, every write can be processed in the local datacenter and is replicated asynchronously to the other datacenters.
A multi-leader configuration with asynchronous replication can usually tolerate network problems
Although multi-leader replication has advantages, it also has a big downside: the same data may be concurrently modified in two different datacenters, and those write conflicts must be resolved
autoincrementing keys, triggers, and integrity constraints can be problematic. For this reason, multi-leader replication is often considered dangerous territory that should be avoided if possible
multi-leader replication is appropriate is if you have an application that needs to continue to work while it is disconnected from the internet. For example, consider the calendar apps on your mobile phone, your laptop, and other devices. You need to be able to see your meetings (make read requests) and enter new meetings (make write requests) at any time,
In a single-leader database, the second writer will either block and wait for the first write to complete, or abort the second write transaction, forcing the user to retry the write.
in a multi-leader setup, both writes are successful, and the conflict is only detected asynchronously at some later point in time. At that time, it may be too late to ask the user to resolve the conflict.
If you want synchronous conflict detection, you might as well just use single-leader replication.
if the application can ensure that all writes for a particular record go through the same leader, then conflicts cannot occur.
in an application where a user can edit their own data, you can ensure that requests from a particular user are always routed to the same datacenter and use the leader in that datacenter
the database must resolve the conflict in a convergent way, which means that all replicas must arrive at the same final value when all changes have been replicated.
last write wins (LWW). Although this approach is popular, it is dangerously prone to data loss
let writes that originated at a higher-numbered replica always take precedence over writes that originated at a lower-numbered replica.
Record the conflict in an explicit data structure that preserves all information, and write application code that resolves the conflict at some later time (perhaps by prompting the user).
conflict resolution usually applies at the level of an individual row or document, not for an entire transaction
MySQL by default supports only a circular topology [34], in which each node receives writes from one node and forwards those writes (plus any writes of its own) to one other node.
To prevent infinite replication loops, each node is given a unique identifier, and in the replication log, each write is tagged with the identifiers of all the nodes it has passed through [43].
With multi-leader replication, writes may arrive in the wrong order at some replicas.
Simply attaching a timestamp to every write is not sufficient, because clocks cannot be trusted to be sufficiently in sync
Riak, Cassandra, and Voldemort are open source datastores with leaderless replication models inspired by Dynamo, so this kind of database is also known as Dynamo-style.
unlike a leader database, that coordinator does not enforce a particular ordering of writes.
in a leaderless configuration, failover does not exist.
when a client reads from the database, it doesn’t just send its request to one replica: read requests are also sent to several nodes in parallel
Version numbers are used to determine which value is newer
some datastores have a background process that constantly looks for differences in the data between replicas and copies any missing data from one replica to another.
without an anti-entropy process, values that are rarely read may be missing from some replicas and thus have reduced durability,
if there are n replicas, every write must be confirmed by w nodes to be considered successful, and we must query at least r nodes for each read.
A common choice is to make n an odd number (typically 3 or 5) and to set w = r = (n + 1) / 2 (rounded up).
The quorum condition, w + r > n, allows the system to tolerate unavailable nodes
among the nodes you read there must be at least one node with the latest value
a winner is picked based on a timestamp (last write wins), writes can be lost due to clock skew
The parameters w and r allow you to adjust the probability of stale values being read, but it’s wise to not take them as absolute guarantees.
Stronger guarantees generally require transactions or consensus.
in systems with leaderless replication, there is no fixed order in which writes are applied, which makes monitoring more difficult.
reachable nodes remain, so the client can no longer reach a quorum.
sloppy quorum [37]: writes and reads still require w and r successful responses, but those may include nodes that are not among the designated n “home” nodes for a value.