More on this book
Community
Kindle Notes & Highlights
by
Sam Newman
Read between
September 27 - October 1, 2019
In many ways, this is another form of what is called eventual consistency.
The most common algorithm for handling distributed transactions — especially short-lived transactions, as in the case of handling our customer order — is to use a two-phase commit.
The various algorithms are hard to get right, so I’d suggest you avoid trying to create your own. Instead, do lots of research on this topic if this seems like the route you want to take, and see if you can use an existing implementation.
Netflix has decided to standardize on Cassandra as the backing store for its services, of which there are many. Netflix has invested significant time in building tools to make Cassandra easy to work with, much of which the company has shared with the rest of the world via numerous open source projects.
I tend to do much of my thinking in the place where the cost of change and the cost of mistakes is as low as it can be: the whiteboard.
Do you check in to mainline once per day?
Do you have a suite of tests to validate your changes?
When the build is broken, is it the #1 priority of the team to fix it?
we start with the simplest option, we could lump everything in together. We have a single, giant repository storing all our code, and have one single build,
Furthermore, if my one-line change to the user service breaks the build, no other changes can be made to the other services until that break is fixed. And think about a scenario where you have multiple teams all sharing this giant build. Who is in charge?
By storing all our configuration in source control, we are trying to ensure that we can automatically reproduce services and hopefully entire environments at will.
Terraform is a very new tool from Hashicorp, which works in this space. I’d generally shy away from mentioning such a new tool in a book that is more about ideas than technology, but it is attempting to create an open source tool along these lines.
With microservices, we have added another level of complexity.
Brian Marick’s testing quadrant. Crispin, Lisa; Gregory, Janet, Agile Testing: A Practical Guide for Testers and Agile Teams, 1st Edition, © 2009.
Mike Cohn’s Test Pyramid. Cohn, Mike, Succeeding with Agile: Software Development Using Scrum, 1st Edition, © 2010.
The prime goal of these tests is to give us very fast feedback about whether our functionality is good.
When I talk about stubbing downstream collaborators, I mean that we create a stub service that responds with canned responses to known requests from the service under test. For example, I might tell my stub points bank that when asked for the balance of customer 123, it should return 15,000. The test doesn’t care if the stub is called 0, 1, or 100 times.
When using a mock, I actually go further and make sure the call was made. If the expected call is not made, the test fails. Implementing this approach requires more smarts in the fake collaborators that we create, and if overused can cause tests to become brittle. As noted, however, a stub doesn’t care if it is called 0, 1, or many times.
While I feel that stubs and mocks are actually fairly well differentiated, I know the distinction can be confusing to some, especially when some people throw in other terms like fakes, spies, and dummies. Martin Fowler calls all of these things, including stubs and mocks, test doubles.
Flaky and Brittle Tests
The more moving parts, the more brittle our tests may be, and the less deterministic they are. If you have tests that sometimes fail, but everyone just re-runs them because they may pass again later, then you have flaky tests.
When we detect flaky tests, it is essential that we do our best to remove them. Otherwise, we start to lose faith in a test suite that “always fails like that.”
the idea that over time we can become so accustomed to things being wrong that we start to accept them as being normal and not a problem.
With the end-to-end test step, it is easy to start thinking, So, I know all these services at these versions work together, so why not deploy them all together? This very quickly becomes a conversation along the lines of, So why not use a version number for the whole system?
Pact is a consumer-driven testing tool that was originally developed in-house at RealEstate.com.au, but is now open source, with Beth Skurrie driving most of the development. Originally just for Ruby, Pact now includes JVM and .NET ports.
Confusingly, there is a ThoughtWorks open source project called Pacto, which is also a Ruby tool used for consumer-driven testing.
From speaking to people who have been implementing microservices at scale for a while now, I have learned that most of them over time remove the need entirely for end-to-end tests in favor of tools like CDCs and improved monitoring.
One way in which we can catch more problems before they occur is to extend where we run our tests beyond the traditional predeployment steps.
A common example of this is the smoke test suite, a collection of tests designed to be run against newly deployed software to confirm that the deployment worked.
There is another technique worth discussing briefly here too, which is sometimes confused with blue/green deployments, as it can use some of the same technical implementations. It is known as canary releasing.
With canary releasing, we are verifying our newly deployed software by directing amounts of production traffic against the system to see if it performs as expected. “Performing as expected” can cover a number of things, both functional and nonfunctional.
Netflix uses this approach extensively. Prior to release, new service versions are deployed alongside a baseline cluster that represents the same version as production. Netflix then runs a subset of the production load over a number of hours against both the new version and the baseline, scoring both.
Optimize for fast feedback, and separate types of tests accordingly.
Avoid the need for end-to-end tests wherever possible by using consumer-driven contracts.
Use consumer-driven contracts to provide focus points for convers...
This highlight has been truncated due to consecutive passage length restrictions.
This chapter focused mostly on making sure our code works before it hits production, but we also need to know how to make sure our code works once it’s deployed. In the next chapter, we’ll take a look at how to monitor our microservice-based systems.
Now the number of hosts we are running on is becoming a challenge. SSH-multiplexing to retrieve logs probably isn’t going to cut it now, and there isn’t a screen big enough for you to have terminals open on every host. Instead, we’re looking to use specialized subsystems to grab our logs and make them available centrally. One example of this is logstash, which can parse multiple logfile formats and can send them to downstream systems for further investigation.
Kibana is an ElasticSearch-backed system for viewing logs, illustrated in Figure 8-4. You can use a query syntax to search through logs, allowing you to do things like restrict time and date ranges or use regular expressions to find matching strings.
One approach that can be useful here is to use correlation IDs. When the first call is made, you generate a GUID for the call. This is then passed along to all subsequent calls, as seen in Figure 8-5, and can be put into your logs in a structured way, much as you’ll already do with components like the log level or date.
Track inbound response time at a bare minimum. Once you’ve done that, follow with error rates and then start working on application-level metrics.
Track the health of all downstream responses, at a bare minimum including the response time of downstream calls, and at best tracking error rates. Libraries like Hystrix can help here.
Standardize on how and where metrics ...
This highlight has been truncated due to consecutive passage length restrictions.
Monitor the underlying operating system so you can track down rogue processes and do capacity planning.
Have a single, queryable tool for aggregating and storing logs.
Strongly consider standardizing on the use of correlation IDs.
authentication is the process by which we confirm that a party is who she says she is.