Mindaugas Mozūras

31%
Flag icon
Teams have some internal flexibility, but common postmortem triggers include: User-visible downtime or degradation beyond a certain threshold Data loss of any kind On-call engineer intervention (release rollback, rerouting of traffic, etc.) A resolution time above some threshold A monitoring failure (which usually implies manual incident discovery)
Site Reliability Engineering: How Google Runs Production Systems
Rate this book
Clear rating