Brian’s Kindle Notes & Highlights

Practice of Cloud System Administration, The: DevOps and SRE Practices for Web Services, Volume 2, by Thomas A. Limoncelli

Because of these differences, distributed computing services are best managed by a separate team, with separate management, with bespoke operational and management practices.

Would like to see active debate between this Google and Amazon view on operations (and operational excellence)? Where is the pressure to consider investment in system/product/app/service changes to improve uptime and reduce cost of maintenance/availability/downtime/recovery? It may make a lot of sense to specialize functions to separate the roles, but it may require a certain size, and there must be strong mechanisms to collaborate and continuously strive for operational excellence. I guess AWS is a little closer to Google than much of Amazon, with a tier of support engineers who handle 'routine' maintenance and ops (but is that a good thing or a sign of missed excellence opportunities?). It may we'll allow them to move fast and have a little more buffer on when infrastructure investments must take place? I also wonder what the comp difference (if any) exists between Google developers and reliability engineers. Probably is one, right?

See Brian’s 54 notes & 71 highlights