Review:
High level overview of how to design highly scalable systems, each component is described in detail and broken down into potential use cases so it's easy to understand how it fits in the architecture: benefits, caveats, challenges, tradeoffs. Feels like a great combination of breadth and depth that also includes a vast reference list of books, white papers, talks and links to expand on each topic.
Notes:
#1: Core Concepts
- Avoid full re-writes, they always cost and take more than expected and you end up with similar issues.
- Single-server configuration means having everything running off one machine, good option for small applications and can be scaled vertically.
- Vertical scaling gets more expensive in time and has a fixed ceiling.
- Isolation of services means running different components off different servers like DNS, cache, data storage, static assets, etc., which can then be vertically scaled independently.
- CDNs bring things like geoDNS for your static assets so requests will be routed to the geographically closest servers.
- SOA is an architecture centred around loosely coupled and autonomous services that solve specific business needs.
- Layered architectures divide functionality into layers where the lower layers provide an API for and don't know about the upper ones.
- Hexagonal architecture asumes the business logic is in the center and all interactions with other components, like data stores or users, are equal, everything outside the business logic requires a strict contract.
- Event-driven architecture is a different way of thinking about actions, is about reacting to events that already happened while traditional architectures are about responding to requests.
#2: Principles of Good Software Design
- Simplicity.
- Hide complexity behind abstractions.
- Avoid overengineering.
- Try TDD, use SOLID.
- Promote loose coupling, drawing diagrams helps spot coupling.
- DRY and don't waste time by following inefficient processes or reinventing the wheel.
- Functional Partitioning: dividing the system based on functionality where each subsystem has everything it needs to operate like data store, cache, message queue, queue workers; eg.: profile web service, scheduling web service.
- Design for self healing.
#3: Building the FE Layer
- You can have your frontend be a traditional multipage web application, SPA, or a hybrid.
- Web servers can be stateless (handle requests from every user indistinctly) or stateful (handle every request of some users).
- Prefer stateless servers where session is not kept within the servers but pushed to another layer like a shared session storage which makes for better horizontal scalability since you can add/remove clones without affecting users bound sessions.
- HTTP is stateless but cookies make it stateful.
- On the first HTTP request the server can send a Set-Cookie:SID=XYZ... response header, after that, every consecutive request needs to contain the cookie Cookie:SID=XYZ...
- Session data can be stored in cookies: which means every request (css, images, etc) contains the full body of session data, when state changes, the server needs to re-send a Set-Cookie:SDATA=ABCDE... which increases request payload so only use if session state is small. As an advantage you don't have to store any session data in the server.
- Keep session state in shared data store: like Memcached, Redis, Cassandra; then you only append the session id to every request while the full session data is kept in the shared store which can be distributed and partitioned by session id.
- Or use a load balancer that supports sticky sessions: this is not flexible as it makes the load balancer know about every user, scaling is hard since you can't restart or decommission web servers without breaking users sessions.
- Components of scalable frontends: DNS, CDN, Load Balancer, FE web server cluster.
- DNS is the first component hit by users, use a third-party hosted service in almost all cases. There are geoDNS and latency based DNS services like Amazon Rout 53 which can be better than the former cause they take into consideration realtime network congestion and outages in order to route traffic.
- Load Balancers help with horizontal scaling since users don't hit web servers directly.
* can provide SSL offloading (or SSL termination) where the LB encrypts and decrypts HTTPS connections and then your web servers talk HTTP among them internally.
* there are providers like Amazon ELB, software-based LB like Nginx and HAProxy and hardware based that can be more expensive but more easily to vertically scale.
* apart from an external LB routing traffic to FE web servers, you can have an internal LB to distribute from FE to web services instances and gain all the benefits of an LB internally.
* some LB route the TCP connections themselves allowing the use of other protocols apart from HTTP.
#4: Web Services
- The monolithic approach: where you can have a single mvc web app containing all the web application controllers, mobile services controllers, third-party integration service controllers and shared business logic.
* Here every time you need to integrate a new parter, you'd need to build that into the same mvc app as a set of controllers and views since there's not concept of separate web services layer.
- The API-first approach: all clients talk to the web application using the same API which is a set of shared web services containing all the business logic.
- Function-centric web services: is the concept of calling functions or objects in remote machines without the need to know how they're implemented like SOAP, CORBA, XML-RPC, DCOM.
* SOAP dominates this space, it uses WSDL files for describing methods and endpoints available and provide service discovery and an XSD files for describing data structure.
* Your client code doesn't need to know that is calling a remote web service, it only needs to know the objects generated on web services contract (WSDL + XSD files).
* dozens of additional specifications were created for SOAP (referred to as ws-* like ws-security) for features like transactions, multiphase commits, authentication, encryption, etc., but this caused reduced interoperability, it made it difficult to integrate between development stacks as different providers had different levels of support for the ws-* specifications, specially in dynamic languages like PHP, Ruby, Perl, Python; this led to the alternative and easier to implement JSON + REST.
* You couldn't either cache HTTP calls on URL alone since it was the same every time due to request params and method names being part of the XML document itself.
* Some of the ws-* specifications introduced state, preventing you from treating web service machines as stateless clones.
* Having a strict contract and the ability to discover functions and data types, provides a lot of value.
- Resource-centric web services treat every resource as a type of object, you can model them as you like but the operations you can perform on them are standard (POST, GET, PUT, DELETE), different from SOAP where you have arbitrary functions which take and produce arbitrary values.
* REST is predictable, you know the operations will be always the same, while on SOAP each service had its conventions, standards and ws-* specifications.
* Uses JSON rather than XML which is lighter and easier to read.
* REST frameworks or containers is pretty much just a simple HTTP server that maps URLs to your code.
* It doesn't provide discoverability and auto-generation of client code that SOAP has with WSDL and XSD but by being more flexible it allows for nonbreaking changes to be released in the server without the need to redeploy client code.
#5: Data Layer
- Replication scales reads not writes.
* On MySQL you can have a master and slave topology, there's a limit on the slave count but you can then have multilayer slaves to increase the limit.
* All writes go to master while replicas are read-only.
* Replication happens asynchronously.
* Each slave has a binlog file with the list of statements from master to be executed and a relay log with the already executed ones.
* Promoting a slave to master is a manual process.
* You can have a master-master topology for a faster/easier failover process but still not automatic in MySQL. In this topology, you write to either one and the other replicates from its binlog.
* The more masters you have, the longer the replication lag, hence worst write times.
- Active Data Set: amount of data accessed within a time window (an hour, day, week). Its the live data that's more frequently read/written.
* A large active data set size is a common scalability issue.
* Replication can help increasing concurrent reads but not if you want to increase the active data size since an entire copy of the dataset needs to be kept in each slave.
- When sharding, you want to keep together sets of data that will be accessed together and spread the load evenly among servers.
- You can apply sharding to object caches, message queues, file systems, etc.
* Cross-shard queries are a big challenge and should prob be avoided.
* Cross-shard transactions are not ACID.
- Try to avoid distributed transactions.
- NoSQL databases make compromises in order to support their priority features.
- The famous "pick two" phrase related to the CAP theorem is not entirely true.
* If you're system is still operational after a number of failures, you still have partition tolerance.
* Quorum consistency provides consistency while still having availability at the price of latency.
- Quorum consistency means the majority of the nodes agree on the result (reads and/or writes), you can implement this for some of your queries which means you're trading latency for consistency on eventually consistent systems, meaning you can still enforce read-after-write semantics.
- Eventual Consistency: property of a system where nodes may have different versions of the data until it eventually propagates.
* These systems favor availability, they give no guarantees that the data you're getting is the freshest.
* Amazon Dynamo (designed to support the checkout process) works this way, it saves all conflicts and sends them to the client where the reconciliation happens which results on both shopping cart versions being merged. They prefer showing previously removed item in a shopping cart than loosing data.
- Cassandra topology: all nodes are the same, they all accept reads and writes, when designing your cluster you decide how many nodes you want the data to be replicated to which happens automatically, all nodes know which has what data.
* It's a mix of Google's BigTable and Amazon's Dynamo, it has huge tables that most of the times are not related, rows can be partitioned by the row key among nodes and you can add columns on the fly, not all rows need to have all the columns.
* When a client connects to a node, this one acts as the session coordinator, it finds which nodes have the requested data and delegates to them.
* It implements a self-healing technique where 10% of all transactions trigger an async background check against all replicas and sends updates to the ones with stale data.
#6: Caching
- Cache hit ratio is the most important metric of caching which is affected by 3 factors.
* data set size: the most unique your cache keys, the less chance of reusing them. If you wanted to cache based on user's IP address, you have up to 4 billion keys, different from caching based on country.
* space: how big are the data objects.
* TTL (time to live).
- HTTP caches are read-through caches.
* For these you can use request/response headers or HTML metatags, try to avoid the latter to prevent confusion.
* Favor "Expires: A-DATE" over "Cache-Control: max-age=TTL", the latter is inconsistent and less backwards compatible.
* Browser cache: exists in most browsers.
* Caching Proxies: usually a local server in your office or ISP to reduce internet traffic, less used nowadays cause of cheaper internet prices.
* HTTPS prevents caching on intermediary proxies.
* Reverse Proxy: reduces the load on your web servers and can be installed in the same machine as the load balancer like Nginx or Varnish.
* CDNs can act as caches.
* Most caching systems use Least Recently Used (LRU) algorithm to reclaim memory space.
- Object Caches are cache-aside caches.
* Client side like the web storage specification, up to 5 and 25MB.
* Distributed Object Caches: ease the load on data stores, you app will check them before making a request to the db, you can decide if updating the cache on read or write.
* Caching invalidation is hard: if you were caching each search result on an e-commerce site, then updated one product, there's no easy way of invalidating the cache without running each of the search queries and see if the product is included.
* The best approach often is to set a short TTL.
#7: Asynchronous Processing
- Messages are fire-and-forget requests.
- Message queue: can be as simple as a database table.
- Message Broker | Event Service Bus (ESB) | Message-Oriented Middleware (MOM): app that handles message queuing, routing and delivery, often optimised for concurrency and throughput.
- Message producers are clients issuing requests (messages).
- Message Consumers are your servers which process messages.
* Can be cron-like (pull model): connects periodically to the queue to check its status, common among scripting languages without a persistent running application container: PHP, Ruby, Perl.
* Daemon-like (push-model): consumers run in an infinite loop usually with a permanent connection to the broker, common with languages with persistent application containers: C#, Java, Node.js.
- Messaging protocols.
* AMQP is recommended, standardised (equally implemented among providers) well-defined contract for publishing, consuming and transferring messages, provides delivery guarantees, transactions, etc.
* STOMP, simple and text based like HTTP, adds little overhead but advanced features require the use of extensions via custom headers which are non-standard like prefetch-count.
* JMS is only for JVM languages.
- Benefits of message queues:
* Async processing.
* Evening out Traffic Spikes: if you get more requests, the queue just gets longer and messages take longer to get picked up but consumers will still process the same amount of messages and eventually even out the queue.
*Isolate failures, self-healing and decoupling.
- Challenges.
* No message ordering, group Ids can help, some providers will guarantee ordered delivery of messages of same group, watch out for consumer idle time.
* Race conditions.
* Message re-queuing: strive for idempotent consumers.
* Don't couple producers with consumers.
- Event sourcing: technique where every change to the application state is persisted in the form of an event, usually tracked in a log file, so at any point in time you can replay back the events and get to an specific state, MySQL replication with binary log files is an example.
#8: Searching for Data
- High cardinality fields (many unique values) are good index candidates cause they narrow down searches.
- Distribution factor also affects the index.
- You can have compound indexes to combine a high cardinality field and low distributed one.
- Inverted index: allows search of phrases or words in full-text-search.
* These break down words into tokens and store next to them the document ids that contain them.
* When making a search you get the posting lists of each word, merge them and then find the intersections.
#9: Other Dimensions of Scalability
- Automate testing, builds, releases, monitoring.
- Scale yourself
* Overtime is not scaling
* Recovering from burnout can take months
- Project management levers: Scope, Time, Cost
* When modifying one, the others should recalibrate.
* Influencing scope can be the easiest way to balance workload.
- "Without data you're just another person with an opinion".
- Stay pragmatic using the 80/20 heuristic rule.