A software bug that causes every instance of an application server to crash when given a particular bad input. For example, consider the leap second on June 30, 2012, that caused many applications to hang simultaneously due to a bug in the Linux kernel [9]. A runaway process that uses up some shared resource—CPU time, memory, disk space, or network bandwidth. A service that the system depends on that slows down, becomes unresponsive, or starts returning corrupted responses. Cascading failures, where a small fault in one component triggers a fault in another component, which in turn triggers
...more