Fault domains are one of the most useful concepts I've found in architecting reliable systems, and don't get enough attention. If you want to make your software predictably reliable, including measuring your reliability risk, then it's an extremely useful concept to spend some time with.
A fault domain is a collection of functionality, services or components with a shared single point of failure. A fault level is a set of one or more fault domains serving the same purpose, for example if you...
Published on August 17, 2019 07:00