Robert Gustavo’s Kindle Notes & Highlights

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems, by Martin Kleppmann

Beware that averaging percentiles, e.g., to reduce the time resolution or to combine data from several machines, is mathematically meaningless — the right way of aggregating response time data is to add the histograms

Good luck with that. Grass spews out a metric shitload of requests, with the number based on the user's social graph and the number of books in a work. We measure the maximum wait time for our threads per request, and take the tp99 of that across requests across many servers. It's mathematically meaningless as a measurement, but when it goes up we know we need to adjust our thread pools or add servers.

See Robert Gustavo’s 38 notes & 38 highlights