More on this book
Community
Kindle Notes & Highlights
Consider using cache when data is read frequently but modified infrequently.
A CDN is a network of geographically dispersed servers used to deliver static content. CDN servers cache static content like images, videos, CSS, JavaScript files, etc.
Each web server in the cluster can access state data from databases. This is called stateless web tier.
To further scale our system, we need to decouple different components of the system so they can be scaled independently. Messaging queue is a key strategy employed by many real-world distributed systems to solve this problem.
When choosing a sharding key, one of the most important criteria is to choose a key that can evenly distributed data.
The system design interview simulates real-life problem solving where two co-workers collaborate on an ambiguous problem and come up with a solution that meets their goals.
Many think that system design interview is all about a person's technical design skills. It is much more than that. An effective system design interview gives strong signals about a person's ability to collaborate, to work under pressure, and to resolve ambiguity constructively. The ability to ask good questions is also an essential skill, and many interviewers specifically look for this skill.
Step 1 - Understand the problem and establish design scope
In a system design interview, giving out an answer quickly without thinking gives you no bonus points. Answering without a thorough understanding of the requirements is a huge red flag
Step 2 - Propose high-level design and get buy-in
It is a great idea to collaborate with the interviewer during the process.
Step 3 - Design deep dive
Dos •Always ask for clarification. Do not assume your assumption is correct. •Understand the requirements of the problem. •There is neither the right answer nor the best answer. A solution designed to solve the problems of a young startup is different from that of an established company with millions of users. Make sure you understand the requirements. •Let the interviewer know what you are thinking. Communicate with your interview. •Suggest multiple approaches if possible. •Once you agree with your interviewer on the blueprint, go into details on each component. Design the most critical
...more
"Consistent hashing is a special kind of hashing such that when a hash table is re-sized and consistent hashing is used, only k/n keys need to be remapped on average, where k is the number of keys, and n is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped
CAP theorem states it is impossible for a distributed system to simultaneously provide more than two of these three guarantees: consistency, availability, and partition tolerance.
Since network failure is unavoidable, a distributed system must tolerate network partition. Thus, a CA system cannot exist in real-world applications.
There are two challenges while partitioning the data: •Distribute data across multiple servers evenly. •Minimize data movement when nodes are added or removed.
Consistent hashing discussed in Chapter 5 is a great technique to solve these problems.
To achieve high availability and reliability, data must be replicated asynchronously over N servers, where N is a configurable parameter.
Since data is replicated at multiple nodes, it must be synchronized across replicas. Quorum consensus can guarantee consistency for both read and write operations.
If W + R > N, strong consistency is guaranteed because there must be at least one overlapping node that has the latest data to ensure consistency.
Replication gives high availability but causes inconsistencies among replicas. Versioning and vector clocks are used to solve inconsistency problems.
In a distributed system, it is insufficient to believe that a server is down because another server says so. Usually, it requires at least two independent sources of information to mark a server down.
A better solution is to use decentralized failure detection methods like gossip protocol.
If a server is unavailable due to network or server failures, another server will process requests temporarily. When the down server is up, changes will be pushed back to achieve data consistency. This process is called hinted handoff.
What if a replica is permanently unavailable? To handle such a situation, we implement an anti-entropy protocol to keep replicas in sync. Anti-entropy involves comparing each piece of data on replicas and updating each replica to the newest version. A Merkle tree is used for inconsistency detection and minimizing the amount of data transferred.
To build a system capable of handling data center outage, it is important to replicate data across multiple data centers.

