Site Reliability Engineering: How Google Runs Production Systems
Rate it:
Open Preview
1%
Flag icon
The tribal nature of IT culture often entrenches practitioners in dogmatic positions that hold the industry back.
3%
Flag icon
SRE is what happens when you ask a software engineer to design an operations team.
3%
Flag icon
In general, an SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s).
4%
Flag icon
In general, for any software service or system, 100% is not the right reliability target because no user can tell the difference between a system being 100% available and 99.999% available.
4%
Flag icon
Monitoring should never require a human to interpret any part of the alerting domain. Instead, software should do the interpreting, and humans should be notified only when they need to take action.