Brian

54%
Flag icon
To accelerate this kind of improvement, we introduce malfunctions artificially. Rather than trying to avoid malfunctions, we instigate them. If a failover process is broken, we want to learn this fact in a controlled way, preferably during normal business hours when the largest number of people are awake and available to respond. If we do not trigger failures in a controlled way, we will learn about such problems only during a real emergency: usually after hours, usually when everyone is asleep.
Practice of Cloud System Administration, The: DevOps and SRE Practices for Web Services, Volume 2
Rate this book
Clear rating
Open Preview