More on this book
Community
Kindle Notes & Highlights
Read between
August 2 - December 28, 2020
For correct ordering, you would need the clock source to be significantly more accurate than the thing you are measuring (namely network delay).
So-called logical clocks [56, 57], which are based on incrementing counters rather than an oscillating quartz crystal, are a safer alternative for ordering events
it doesn’t make sense to think of a clock reading as a point in time — it is more like a range of times, within a confidence interval:
An interesting exception is Google’s TrueTime API in Spanner
which explicitly reports the confidence interval on the local clock.
The most common implementation of snapshot isolation requires a monotonically increasing transaction ID.
However, when a database is distributed across many machines, potentially in multiple datacenters, a global, monotonically increasing transaction ID (across all partitions) is difficult to generate,
With lots of small, rapid transactions, creating transaction IDs in a distributed system becomes an untenable bottleneck.vi
Only if the intervals overlap are we unsure in which order A and B happened.
Spanner deliberately waits for the length of the confidence interval before committing a read-write transaction.
Google deploys a GPS receiver or atomic clock in each datacenter, allowing clocks to be synchronized to within about 7 ms
Using clock synchronization for distributed transaction semantics is an area of active research
Process Pauses
writes. How does a node know that it is still leader
One option is for the leader to obtain a lease from the other nodes, which is similar to a lock with a timeout
Firstly, it’s relying on synchronized clocks: the expiry time on the lease is set by a different machine
and it’s being compared to the local system clock.
the code assumes
that very little time passes between the point that it checks the time (System.currentTimeMillis()) and the time when the request is processed (process(request)). Normally this code runs very quickly, so the 10 second buffer
what if there is an unexpected pause in the execution of the program? For example, imagine the thread stops for 15 seconds around the line lease.isValid()
thread might be paused for so long? Unfortunately not. There are various reasons why this could
garbage collector (GC) that occasionally needs to stop all running threads.
“concurrent” garbage collectors like the HotSpot JVM’s CMS cannot fully run in parallel with the application code — even they need to stop the world from time to time
a virtual machine can be suspended (pausing the execution of all processes and saving the contents of memory to disk) and resumed
execution may also be suspended and resumed arbitrarily, e.g., when the user closes the lid of their laptop.
the operating system context-switches to another thread,
the hypervisor switches to a different virtual machine
the currently running thread can be paused at any arbitrary...
This highlight has been truncated due to consecutive passage length restrictions.
the application performs synchronous disk access, a thread may be paused waiting for a slow disk I/O operation to complete
the Java classloader lazily loads class files when they are first used, which could happen at any time in the program execution.
the disk is actually a network filesystem or network block device (such as Amazon’s EBS), the I/O latency is further subject to the variability of network delays
swapping to disk (paging), a simple memory access may result in a page fault that requires a page from disk to be loaded into memory.
swapping pages in and out of memory and getting little actual work done (this is known as thrashing).
A Unix process can be paused by sending it the SIGSTOP signal, for example by pressing Ctrl-Z in a shell.
The problem is similar to making multi-threaded code on a single machine thread-safe: you can’t assume anything about timing, because arbitrary context switches and parallelism may occur.
A node in a distributed system must assume that its execution can be paused for a significant length of time at any point, even in the middle of a function.
Some software runs in environments where a failure to respond within a specified time can cause serious damage: computers that control aircraft, rockets, robots, cars, and other physical objects must respond quickly and predictably to their sensor inputs.
deadline
hard real-time systems.
you wouldn’t want the release of the airbag to be delayed due to an inopportune GC pause in the airbag release system.
a real-time operating system (RTOS) that allows processes to be scheduled with a guaranteed allocation of CPU time in specified intervals is needed;
Moreover, “real-time” is not the same as “high-performance” — in fact, real-time systems may have lower throughput, since they have to prioritize timely responses above
An emerging idea is to treat GC pauses like brief planned outages of a node,
A variant of this idea is to use the garbage collector only for short-lived objects (which are fast to collect) and to restart processes periodically,
a node cannot necessarily trust its own judgment of a situation. A distributed system cannot exclusively rely on a single node, because a node may fail at any time, potentially leaving the system stuck and unable to recover.
many distributed algorithms rely on a quorum, that is, voting among the nodes
system requires there to be only one of some thing.
Implementing this in a distributed system requires care:
The problem is an example of what we discussed in “Process Pauses”: if the client holding the lease is paused for too long, its lease expires. Another client can obtain a lease for the same file, and start writing to the file. When the paused client comes back, it believes (incorrectly) that it still has a valid lease and proceeds to also write to the file.
Let’s assume that every time the lock server grants a lock or lease, it also returns a fencing token, which is a number that increases every time a lock is granted