More on this book
Community
Kindle Notes & Highlights
by
Sam Newman
Read between
December 23, 2016 - January 2, 2017
The German phrase Datensparsamkeit represents this concept. Originating from German privacy legislation, it encapsulates the concept of only storing as much information as is absolutely required to fulfill business operations or satisfy local laws.
But let’s again consider what microservices are: services modeled after a business domain, not a technical one. And if our team that owns any given service is similarly aligned along the business domain, it is much more likely that the team will be able to retain a customer focus, and see more of the feature development through, because it has a holistic understanding and ownership of all the technology associated with a service. Cross-cutting changes can occur, of course, but their likelihood is significantly reduced by our avoiding technology-oriented teams.
Perhaps the people who worked on the service originally are no longer on a team together; perhaps they are now scattered across the organization. Well, if they still have commit rights, you can find them and ask for their help, perhaps pairing up with them, or if you have the right tooling you can send them a pull request.
Internal open source model (for things that are shared, maybe that don't change super often and don't need dedicated owners?
One example of this is integration methods. Within an LOB, all services are free to talk to each other in any way they see fit, as decided by the squads who act as their custodians. But between LOBs, all communication is mandated to be asynchronous batch, one of the few cast-iron rules of the very small architecture team.
From real estate Australia, lines of business work together and their services can communicate however they'd like. But between services in different lines of business, async batch communication only. Interesting.
We ended up implementing three fixes to avoid this happening again: getting our timeouts right, implementing bulkheads to separate out different connection pools, and implementing a circuit breaker to avoid sending calls to an unhealthy system in the first place.
With a circuit breaker, after a certain number of requests to the downstream resource have failed, the circuit breaker is blown. All further requests fail fast while the circuit breaker is in its blown state. After a certain period of time, the client sends a few requests through to see if the downstream service has recovered, and if it gets enough healthy responses it resets the circuit breaker.
when I’ve implemented them for HTTP connections I’ve taken failure to mean either a timeout or a 5XX HTTP return code.
If this call is being made as part of a synchronous call chain, however, it is probably better to fail fast. This could mean propagating an error up the call chain, or a more subtle degrading of functionality.
Look at all the aspects of your system that can go wrong, both inside your microservices and between them. Do you have bulkheads in place? I’d suggest starting with separate connection pools for each downstream connection at the very least.
We can think of our circuit breakers as an automatic mechanism to seal a bulkhead, to not only protect the consumer from the downstream problem, but also to potentially protect the downstream service from more calls that may be having an adverse impact.
I’d recommend mandating circuit breakers for all your synchronous downstream calls.
This form of separation allows for different types of scaling. The command and query parts of our system could live in different services, or on different hardware, and could make use of radically different types of data store. This can unlock a large number of ways to handle scale. You could even support different types of read format by having multiple implementations of the query piece, perhaps supporting a graph-based representation of your data, or a key/value-based form of your data.
Feels pretty radical--but might make some sense, especially if there are a few very distinct read needs that lend themselves to different read architecture (I'm thinking My Books ;) )
One way to protect the origin in such a situation is never to allow requests to go to the origin in the first place. Instead, the origin itself populates the cache asynchronously when needed, as shown in Figure 11-7. If a cache miss is caused, this triggers an event that the origin can pick up on, alerting it that it needs to repopulate the cache. So if an entire shard has vanished, we can rebuild the cache in the background.
Chapter 12. Bringing It All Together
This chapter is a really good summary. Almost makes sense to read this first to see where you might be farthest from ideal (which of these sound foreign to you?) then dive into that chapter first. Book seems readable in random chapter order, though recommend reading entire chapter all the way through.

