More on this book
Community
Kindle Notes & Highlights
Read between
February 17 - February 27, 2023
Some teams trigger the final production deployment automatically. Others have a “pause” stage, where some human must provide positive affirmation that “yes, this build is good.” (Worded another way, it says, “Yes, you may fire me if this fails.”)
It’s better for everyone if we do some extra work on our end to maintain compatibility rather than pushing migration costs out onto other teams.
it’s easy to see what makes a breaking change: any unilateral break from a prior agreement. We should be able to make a list of changes that would break agreements: Rejecting a network protocol that previously worked Rejecting request framing or content encoding that previously worked Rejecting request syntax that previously worked Rejecting request routing (whether URL or queue) that previously worked Adding required fields to the request Forbidding optional information in the request that was allowed before Removing information from the response that was previously guaranteed Requiring an
...more
apply generated tests that will push the boundaries of the specification. That can help you find those two key classes of gaps: between what your spec says and what you think it says, and between what the spec says and what your implementation does. This is “inbound” testing,
I also recommend running randomized, generative tests against services you consume. Use their specifications but your own tests to see if your understanding of the spec is correct. This is “outbound” testing, in which you exercise your dependencies to make them act the way you think they do.
Some people call these “contract tests” because they exercise those parts of the provider’s contract that the consumer cares about.
If you do put a version in the URLs, be sure to bump all the routes at the same time. Even if just one route has changed, don’t force your consumers to keep track of which version numbers go with which parts of your API.
Methods that handle the new API go directly to the most current version of the business logic. Methods that handle the old API get updated so they convert old objects to the current ones on requests and convert new objects to old ones on responses.
there’s no such thing as a website project. Every one is really an enterprise integration project with an HTML interface. Most are an API layer over the top of back-end services.
If you tell testers the “happy path” through the application, that’s the last thing they’ll do. It should be the same with load testing. Add noise, create chaos. Noise and chaos might only bleed away some amount of your capacity, but it might also bring your system down.
Thrashing happens when your organization changes direction without taking the time to receive, process, and incorporate feedback. You may recognize it as constantly shifting development priorities or an unending series of crises.
Whether you’re in the cloud or in your own data center, you need a platform team that views application development as its customer. That team should provide API and command-line provisioning for the common capabilities that applications need, as well as the things we looked at in Chapter 10, Control Plane: Compute capacity, including high-RAM, high-IO, and high-GPU configurations for specialized purposes (The needs of machine learning and the needs of media servers are very different.) Workload management, autoscaling, virtual machine placement, and overlay networking Storage, including
...more
It’s common these days, typically in larger enterprises, to find a group called the DevOps team. This team sits between development and operations with the goal of moving faster and automating releases into production. This is an antipattern.
Systems should exhibit loose clustering. In a loose cluster, the loss of an individual instance is no more significant than the fall of a single tree in a forest. However, this implies that individual servers don’t have differentiated roles. At the very least any differentiated roles are present in more than one instance. Ideally, the service wouldn’t have any unique instance. But if it does need a unique role, then it should use some form of leader election. That way the service as a whole can survive the loss of the leader without manual intervention to reconfigure the cluster.
Suppose instead the initial fragment of JSON looked like this: {"itemID": "https://example.com/policies/029292934"} This URL still works if we just want to use it as an opaque token to pass forward. From one perspective, it’s still just a Unicode string. This URL also still works if we need to resolve it to get more information. But now our service doesn’t have to bake in knowledge of the solitary authority. We can support more than one of them.
In a sense, a message sender is communicating with a future (possibly not-yet-written) interface. A message reader is receiving a call from the distant past. So data versioning is definitely a concern.
In practice, you need to encrypt URLs that you send out to users. That way you can verify that whatever you receive back is something you generated.
We don’t capture reality, we only model some aspects of it. There’s no such thing as a “natural” data model, only choices that we make. Every paradigm for modeling data makes some statements easy, others difficult, and others impossible. It’s important to make deliberate choices about when to use relational, document, graph, key-value, or temporal databases.
We always need to think about whether we should record the new state or the change that caused the new state.
Many problems only reveal themselves in the whole system (for example, excessive retries leading to timeouts, cascading failures, dogpiles, slow responses, and single points of failure, to name a few).
“If you have a wall full of green dashboards, that means your monitoring tools aren’t good enough.” There’s always something weird going on.

