Site Reliability Engineering: How Google Runs Production Systems
Rate it:
Open Preview
33%
Flag icon
If you haven’t tried it, assume it’s broken. Unknown
46%
Flag icon
If at first you don’t succeed, back off exponentially. Dan Sandler, Google Software Engineer Why do people always forget that you need to add a little jitter? Ade Oshineye, Google Developer Advocate
48%
Flag icon
Remember that the code path you never use is the code path that (often) doesn’t work.
59%
Flag icon
Unfortunately, business demands usually occur at the least convenient time to refactor the pipeline system into an online continuous processing system. Newer and larger customers who are faced with forcing scaling issues typically also want to include new features, and expect that these requirements adhere to immovable deadlines.
Niharika
So much snark
66%
Flag icon
The last thing we wanted was to make children cry because they couldn’t watch Santa deliver presents. In fact, we dubbed the various kill switches built into the experience to protect our services “Make-children-cry switches.”
69%
Flag icon
The solution to this scenario is to enact some type of churn reduction policy that prohibits infrastructure engineers from releasing backward-incompatible features until they also automate the migration of their clients to the new feature.
71%
Flag icon
As an alternative to the overhead of asking a senior SRE to carefully plan a specific type of breakage that the new SRE(s) must repair, you can also work in the opposite direction with an exercise that may also increase participation from the entire team: work from a known good configuration and slowly impair the stack at selected bottlenecks, observing upstream and downstream efforts through your monitoring. This exercise is valued by the Google Search SRE team, whose version of this exercise is called “Let’s burn a search cluster to the ground!”
77%
Flag icon
In this particular case, the road to hell was indeed paved with JavaScript.
81%
Flag icon
The industries that we surveyed were mixed in terms of if, how, and why they embraced automation. Certain industries trusted humans more than machines.