Start with visibility. Use logging, tracing, and metrics to create transparency. Collect and index logs to look for general patterns. That also gets logs off of the machines for postmortem analysis when a machine or instance fails. Use configuration, provisioning, and deployment services to gain leverage over larger or more dynamic systems.

