For example, we may choose to measure our incidents by the following metrics: ▹ Event severity: How severe was this issue? This directly relates to the impact on the service and our customers. ▹ Total downtime: How long were customers unable to use the service to any degree? ▹ Time to detect: How long did it take for us or our systems to know there was a problem? ▹ Time to resolve: How long after we knew there was a problem did it take for us to restore service?

