More on this book
Community
Kindle Notes & Highlights
by
Sam Newman
Read between
December 23, 2016 - January 2, 2017
The reason we want to test a single service by itself is to improve the isolation of the test to make finding and fixing problems faster. To achieve this isolation, we need to stub out all external collaborators so only the service itself is in scope, as Figure 7-5 shows.
Set of tests that are higher level than unit (single method usually) but completely isolated from other services (what about separation from data stores?)
Our service test suite needs to launch stub services for any downstream collaborators (or ensure they are running), and configure the service under test to connect to the stub services.
When I talk about stubbing downstream collaborators, I mean that we create a stub service that responds with canned responses to known requests from the service under test.
Sometimes, though, mocks can be very useful to ensure that the expected side effects happen. For example, I might want to check that when I create a customer, a new points balance is set up for that customer. The balance between stubbing and mocking calls is a delicate one, and is just as fraught in service tests as in unit tests. In general, though, I use stubs far more than mocks for service tests.
We can deal with both of these problems elegantly by having multiple pipelines fan in to a single, end-to-end test stage. Here, whenever a new build of one of our services is triggered, we run our end-to-end tests, an example of which we can see in Figure 7-8.
As test scope increases, so too does the number of moving parts. These moving parts can introduce test failures that do not show that the functionality under test is broken, but that some other problem has occurred.
When we detect flaky tests, it is essential that we do our best to remove them. Otherwise, we start to lose faith in a test suite that “always fails like that.” A test suite with flaky tests can become a victim of what Diane Vaughan calls the normalization of deviance — the idea that over time we can become so accustomed to things being wrong that we start to accept them as being normal and not a problem.1
In “Eradicating Non-Determinism in Tests”, Martin Fowler advocates the approach that if you have flaky tests, you should track them down and if you can’t immediately fix them, remove them from the suite so you can treat them.
Agree with removing these from test suite. Fowler's post suggests keeping these tests (in "quarantine") but limiting size of quarantine to a small number and force yourself to fix one if you add above that limit. Eventually get to zero one hopes, but if one pops up, immediately fix or quarantine. Keep tests such that a failure means regression (not just bad luck)
The best balance I have found is to treat the end-to-end test suite as a shared codebase, but with joint ownership. Teams are free to check in to this suite, but the ownership of the health of the suite has to be shared between the teams developing the services themselves. If you want to make extensive use of end-to-end tests with multiple teams I think this approach is essential, and yet I have seen it done very rarely, and never without issue.
How to deal with end to end test ownership: it's hard but make a first class shared code base among rated services covered by tests.
But what happens with 3, 4, 10, or 20 services? Very quickly these test suites become hugely bloated, and in the worst case can result in Cartesian-like explosion in the scenarios under test. This situation worsens if we fall into the trap of adding a new end-to-end test for every piece of functionality we add.
The best way to counter this is to focus on a small number of core journeys to test for the whole system.
We are trying to ensure that when we deploy a new service to production, our changes won’t break consumers. One way we can do this without requiring testing against the real consumer is by using a consumer-driven contract (CDC).
With CDCs, we are defining the expectations of a consumer on a service (or producer). The expectations of the consumers are captured in code form as tests, which are then run against the producer. If done right, these CDCs should be run as part of the CI build of the producer, ensuring that it never gets deployed if it breaks one of these contracts.
Pact is a consumer-driven testing tool that was originally developed in-house at RealEstate.com.au, but is now open source,
Pact works in a very interesting way, as summarized in Figure 7-11. The consumer starts by defining the expectations of the producer using a Ruby DSL. Then, you launch a local mock server, and run this expectation against it to create the Pact specification file. The Pact file is just a formal JSON specification; you could obviously handcode these, but using the language API is much easier. This also gives you a running mock server that can be used for further isolated tests of the consumer.
Check it out; see if still actively maintained.
Also: does this require fragile coupling of data in test envy with test expectations? Or maybe you define stateful scenario that starts with blank slate?
Not quite clear on role of mock server when defining expectations. Is this the same mock that client dec team is using to isolate dec/testing from actual downstream service? That's a smart pattern really. It at least ensures that the mock and the real thing agree with one another (even if they both deviate from published apis).
Pacto, which is also a Ruby tool used for consumer-driven testing. It has the ability to record interactions between client and server to generate the expectations. This makes writing consumer-driven contracts for existing services fairly easy. With Pacto, once generated these expectations are more or less static, whereas with Pact you regenerate the expectations in the consumer with every build.
The fact that you can define expectations for capabilities the producer may not even have yet also better fits into a workflow where the producing service is still being (or has yet to be) developed.
They mean with pact you can but not with pacto? Can't you do it with either? Or they mean if you have to rebuild spec every time, it's not going to work but with pacto you build only once?
I like that pact is also testing the mocks that consumer is using.
A common example of this is the smoke test suite, a collection of tests designed to be run against newly deployed software to confirm that the deployment worked. These tests help you pick up any local environmental issues. If you’re using a single command-line command to deploy any given microservice (and you should), this command should run the smoke tests automatically.
Another example of this is what is called blue/green deployment. With blue/green, we have two copies of our software deployed at a time, but only one version of it is receiving real requests.
It is common to keep the old version around for a short period of time, allowing for a fast fallback if you detect any errors.
Great idea (if you can afford the hosts): keep old deployment around till new one proves healthy in prod. Rollback /might/ be as simple as reverting back to old fleet (no redeploy required).
A rolling release is similar but takes longer to deploy (and to revert).
When considering canary releasing, you need to decide if you are going to divert a portion of production requests to the canary or just copy production load.
Sometimes expending the same effort into getting better at remediation of a release can be significantly more beneficial than adding more automated functional tests. In the web operations world, this is often referred to as the trade-off between optimizing for mean time between failures (MTBF) and mean time to repair (MTTR).
The metrics you may want to ultimately optimize and importantly, find the right tradeoff/balance. Automated testing will help up to a point, but then you might reduce impact of a failure by investing in monitoring and faster rollback.
Due to the time it takes to run performance tests, it isn’t always feasible to run them on every check-in. It is a common practice to run a subset every day, and a larger set every week. Whatever approach you pick, make sure you run them as regularly as you can. The longer you go without running performance tests, the harder it can be to track down the culprit.
One approach that can be useful here is to use correlation IDs. When the first call is made, you generate a GUID for the call. This is then passed along to all subsequent calls, as seen in Figure 8-5, and can be put into your logs in a structured way, much as you’ll already do with components like the log level or date. With the right log aggregation tooling, you’ll then be able to trace that event all the way through your system:
Given that you’ll already want log aggregation for other purposes, it feels much simpler to instead make use of data you’re already collecting than have to plumb in additional sources of data.
This is especially problematic, as retrofitting correlation IDs in is very difficult; you need to handle them in a standardized way to be able to easily reconsititute call chains. Although it might seem like additional work up front, I would strongly suggest you consider putting them in as soon as you can, especially if your system will make use of event-driven architecture patterns, which can lead to some odd emergent behavior.
For example, if you are using HTTP as the underlying protocol for communication, just wrap a standard HTTP client library, adding in code to make sure you propogate the correlation IDs in the headers.
Can consider really thin shared library for correlation I'd tracking. Should definitely have tests for it; maybe can used standardized lib for some tests across the board instead? Just make sure they propagate.
Therefore, monitoring the integration points between systems is key. Each service instance should track and expose the health of its downstream dependencies, from the database to other collaborating services. You should also allow this information to be aggregated to give you a rolled-up picture. You’ll want to see the response time of the downstream calls, and also detect if it is erroring.
Track inbound response time at a bare minimum. Once you’ve done that, follow with error rates and then start working on application-level metrics.
Track the health of all downstream responses, at a bare minimum including the response time of downstream calls, and at best tracking error rates. Libraries like Hystrix can help here.
Log into a standard location, in a standard format if possible. Aggregation is a pain if every ser...
This highlight has been truncated due to consecutive passage length restrictions.
Ensure your metric storage tool allows for aggregation at a system or service level, and drill down to individual hosts.
Have a single, queryable tool for aggregating and storing logs.
Strongly consider standardizing on the use of correlation IDs.
Investigate the possibility of unifying how you aggregate all of your various metrics by seeing if a tool like Suro or Riemann makes sense for you.
If you go the gateway route, make sure your developers can launch their services behind one without too much work.
If you use gateway service for authentication in front of a number of services, make it easy to insert one for dev environment to test for problems that might only come up in prod otherwise.
Also: be sure to follow security in depth, at each service, don't do mullet-style security (business up front, party in the back) by putting all 'security' into one place.
These decisions need to be local to the microservice in question. I have seen people use the various attributes supplied by identity providers in horrible ways, using really fine-grained roles like CALL_CENTER_50_DOLLAR_REFUND, where they end up putting information specific to one part of one of our system’s behavior into their directory services. This is a nightmare to maintain and gives very little scope for our services to have their own independent lifecycle, as suddenly a chunk of information about how a service behaves lives elsewhere, perhaps in a system managed by a different part of
...more
Self-signed certificates are not easily revokable, and thus require a lot more thought around disaster scenarios. See if you can dodge all this work by avoiding self-signing altogether.
There is a type of vulnerability called the confused deputy problem, which in the context of service-to-service communication refers to a situation where a malicious party can trick a deputy service into making calls to a downstream service on his behalf that he shouldn’t be able to. For example, as a customer, when I log in to the online shopping system, I can see my account details. What if I could trick the online shopping UI into making a request for someone else’s details, maybe by making a call with my logged-in credentials?
Depending on the sensitivity of the operation in question, you might have to choose between implicit trust, verifying the identity of the caller, or asking the caller to provide the credentials of the original principal.
Can choose implicit trust for authentication but verify authorization? No credential passing needed, just identity of original caller. Some risk of breaching perimeter and then calling Willy-Nilly to downstream services. Could be devastating based on the identity that gets compromised. if customers only have access to their own data, should be fine, but if there are admin accts, those will be the targets.
we need to make sure that our backups are also encrypted. This also means that we need to know which keys are needed to handle which version of data, especially if the keys change. Having clear key management becomes fairly important.
Good point re: key management so you know which key to use to decrypt backups. Probably don't want to decrypt/reencrypt all past backups when you rotate keys (though I guess technically that might be more secure assuming the reencrypt step is very safe?)
Sensitive information needs to be culled to ensure we aren’t leaking important data into our logs, which could end up being a great target for attackers.
Whitelisting what gets logged seems safest approach but that may not be easy with free form logging. Wonder what best practices are here. Might be some opportunity for shared library, especially for request/response loggging, but perhaps also for converting objects to strings purposes...force each class to specify a safe-to-string method or whitelist attributes to include in generic to-string?
When logging a request from a user, do we need to store the entire IP address forever, or could we replace the last few digits with x? Do we need to store someone’s name, age, gender, and date of birth in order to provide her with product offers, or is her age range and postcode enough information?

