More on this book
Community
Kindle Notes & Highlights
by
Gene Kim
Read between
February 24 - May 23, 2021
To ensure that we can restore production service repeatedly and predictably (and, ideally, quickly) even when catastrophic events occur, we must check in the following assets to our shared version control repository: •all application code and dependencies (e.g., libraries, static content, etc.) •any script used to create database schemas, application reference data, etc. •all the environment creation tools and artifacts described in the previous step (e.g., VMware or AMI images, Puppet, Chef, or Ansible scripts.) •any file used to create containers (e.g., Docker, Rocket, or Kubernetes
...more
Version control also provides a means of communication for everyone working in the value stream—having Development, QA, Infosec, and Operations able to see each other’s changes helps reduce surprises, creates visibility into each other’s work, and helps build and reinforce trust. (See Appendix 7.) Of course, this means that all teams must use the same version control system.
By having repeatable environment creation systems, we are able to easily increase capacity by adding more servers into rotation (i.e., horizontal scaling). We also avoid the disaster that inevitably results when we must restore service after a catastrophic failure of irreproducible infrastructure, created through years of undocumented and manual production changes.
Depending on the life cycle of the configuration in question, we can rely on our automated configuration systems to ensure consistency (e.g., Puppet, Chef, Ansible, Salt, Bosh, etc.), use a service mesh or configuration management service to propagate runtime configuration (Istio, AWS Systems Manager Parameter Store etc.), or we can create new virtual machines or containers from our automated build mechanism and deploy them into production, destroying the old ones or taking them out of rotation.§§ The latter pattern is what has become known as immutable infrastructure, where manual changes to
...more
Containers satisfy three key things: they abstract infrastructure (the dial-tone principal—you pick up the phone and it works without needing to know how it works), specialization (Operations could create containers that developers could use over and over and over again), and automation (containers can be built over and over again and everything will just work).17
For Dwayne and the hotel company, containers are the way. They’re cloud portable. They’re scalable. Health checks are built in. They could test for latency versus CPU, and certs are no longer in the application or managed by developers. Additionally, they are now able to focus on circuit breaking, they have APM built-in, operate zero trust, and images are very small due to good container hygiene and sidecars being used to enhance everything.21
In this context, environment is defined as everything in the application stack except for the application, including the databases, operating systems, networking, virtualization, and all associated configurations. † Most developers want to test their code, and they have often gone to extreme lengths to obtain test environments to do so. Developers have been known to reuse old test environments from previous projects (often years old) or ask someone who has a reputation of being able to find one—they just won’t ask where it came from because, invariably, someone somewhere is now missing a
...more
Figure 10.1: The Deployment Pipeline Source: Humble and Farley, Continuous Delivery, 3.
Martin Fowler observes that, in general, a ten-minute build [and test process] is perfectly within reason. . . . [We first] do the compilation and run tests that are more localized unit tests with the database completely stubbed out. Such tests can run very fast, keeping within the ten minute guideline. However any bugs that involve larger scale interactions, particularly those involving the real database, won’t be found. The second stage build runs a different suite of tests [acceptance tests] that do hit the real database and involve more end-to-end behavior. This suite may take a couple of
...more
This highlight has been truncated due to consecutive passage length restrictions.
Because we want our tests to run quickly, we need to design our tests to run in parallel, potentially across many different servers. We may also want to run different categories of tests in parallel. For example, when a build passes our acceptance tests, we may run our performance testing in parallel with our security testing, as shown in Figure 10.3. We may or may not allow manual exploratory testing until the build has passed all our automated tests—which enables faster feedback but may also allow manual testing on builds that will eventually fail. Figure 10.3: Running Automated and Manual
...more
Our goal is to write and run automated performance tests that validate our performance across the entire application stack (code, database, storage, network, virtualization, etc.) as part of the deployment pipeline so we detect problems early, when the fixes are cheapest and fastest.

