Rate this book

Systems Performance: Enterprise and the Cloud

Name: Systems Performance: Enterprise and the Cloud
Rating: 4.48 (46 reviews)
ISBN: 9780133390094

Brendan Gregg

Rate this book

The Complete Guide to Optimizing Systems Performance Written by the winner of the 2013 LISA Award for Outstanding Achievement in System Administration
Large-scale enterprise, cloud, and virtualized computing systems have introduced serious performance challenges. Now, internationally renowned performance expert Brendan Gregg has brought together proven methodologies, tools, and metrics for analyzing and tuning even the most complex environments. Systems Enterprise and the Cloud focuses on Linux® and Unix® performance, while illuminating performance issues that are relevant to all operating systems. You&;ll gain deep insight into how systems work and perform, and learn methodologies for analyzing and improving system and application performance. Gregg presents examples from bare-metal systems and virtualized cloud tenants running Linux-based Ubuntu®, Fedora®, CentOS, and the illumos-based Joyent® SmartOS&; and OmniTI OmniOS®. He systematically covers modern systems performance, including the &;traditional&; analysis of CPUs, memory, disks, and networks, and new areas including cloud computing and dynamic tracing. This book also helps you identify and fix the &;unknown unknowns&; of complex bottlenecks that emerge from elements and interactions you were not aware of. The text concludes with a detailed case study, showing how a real cloud customer issue was analyzed from start to finish. Coverage includes &; Modern performance analysis and terminology, concepts, models, methods, and techniques &; Dynamic tracing techniques and tools, including examples of DTrace, SystemTap, and perf &; Kernel uncovering what the OS is doing &; Using system observability tools, interfaces, and frameworks &; Understanding and monitoring application performance &; Optimizing processors, cores, hardware threads, caches, interconnects, and kernel scheduling &; Memory virtual memory, paging, swapping, memory architectures, busses, address spaces, and allocators &; File system I/O, including caching &; Storage devices/controllers, disk I/O workloads, RAID, and kernel I/O &; Network-related performance protocols, sockets, interfaces, and physical connections &; Performance implications of OS and hardware-based virtualization, and new issues encountered with cloud computing &; getting accurate results and avoiding common mistakes This guide is indispensable for anyone who operates enterprise or cloud system, network, database, and web admins; developers; and other professionals. For students and others new to optimization, it also provides exercises reflecting Gregg&;s extensive instructional experience.

GenresTechnologyProgrammingTechnicalComputer ScienceSoftwareEngineeringNonfiction

772 pages, Paperback

First published September 27, 2013

451 people are currently reading

2109 people want to read

About the author

Brendan Gregg

11 books48 followers

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

292 (61%)

4 stars

133 (27%)

3 stars

45 (9%)

2 stars

4 (<1%)

1 star

3 (<1%)

Displaying 1 - 30 of 46 reviews

Yevgeniy Brikman

Author 4 books735 followers

January 11, 2017

This isn't a book, so much as it is a reference manual or an appendix. It's nearly 800 pages of dense, low-level discussions of performance issues related to the CPU, memory, hard drive, OS, and so on. The writing is very structured, repetitive, and dry and resembles a list of facts more than prose. If you have a specific performance issue and need to know how to, say, use DTrace to diagnose an issue with a memory leak, this book is perfect. If you're looking for something you can read cover to cover to generally improve your understanding of system performance, this book probably isn't it.

If you are going to read this, I recommend reading the first few sections of each chapter, which typically have a nice introduction to the architecture of the CPU, memory, etc. They are also full of handy tables, such as typical, real-world latencies and typical performance trade-offs to consider (e.g. cpu vs memory, small vs large record sizes). The remainder of each chapter is a deep-dive into specific performance tools you can use, which is handy as a reference, but does not make for interesting reading otherwise, as there is no way you can retain so much detailed info. I'd also mention that since the author is a Solaris expert and creator of DTrace, you will see a lot of information about both in every single chapter.

The final chapter of the book is great: it walks through a real-world case study and shows how to use various techniques to analyze it and the thought process that goes into tracking down performance bottlenecks. Seeing such a case study gives you a much better sense for the context in which the various performance tools should be used and some awareness of whether the data returned by those tools is normal or not. This would have been a much better book if every chapter had been primarily focused on such case studies, with all the other nitty gritty details tacked on solely as supporting information (perhaps in an appendix!).

Emre Sevinç

178 reviews441 followers

November 25, 2020

Whenever I watch Netflix movies from different locations, on different networks, and via different devices without any serious performance problems, I think about this book. I just can't help it. Guess why?

Brendan Gregg, also known as The Server Whisperer, among other things, wrote one of the most pragmatic and comprehensive books on Unix and Linux performance engineering. If you're in any way seriously involved with benchmarking, tuning, and analyzing GNU/Linux or Unix based systems in the last 10 years, you've either come across some of the tools developed by the author such as DTraceToolkit, or used concepts and techniques developed by him such as USE Method, and Flame Graphs.

I'll cut it short, and try to answer a simple question: Should you buy and study this book?

Yes, and no.

Yes, in the sense that it has solid and very-well laid out & presented material on the fundamental aspects of performance engineering of most of the underlying components and layers such as kernel, file systems, disks, memory, networking, etc. On top of that, the principles clearly explained and exemplified by Gregg are timeless; e.g. queuing theory, benchmarking pitfalls, workload characterization, checklists, visualization techniques, etc. (The book also provides nice historical context which is a plus in my book, but that's just me.) Moreover, the questions at the end of chapter can be considered very nice additions: some of straightforward technical questions will help you fill your technical the gaps, whereas some of the more open ended ones can be considered a very good practice material to stretch your analytic thinking capabilities.

No, in the sense that, a period of 6 years is like a lifetime in this industry, and technology is a moving target, GNU/Linux doubly so! From 2021 and onward, I don't think majority of the readers of this review will need any of the Solaris-related practical and historical stuff in the book (it is great, but not for everyone I guess). Moreover, it'll be good to have a more up-to-date resources for finer details of practical performance engineering tools for GNU/Linux systems, as well as heavier focus on cloud computing related tips, tricks, and pitfalls when it comes to analyzing and tuning performance.

I've already bought this book some time ago, but I'll definitely buy the 2nd Edition from the same author (see the official book web site for more information).

Long story short, most of the body of knowledge in this book (and most probably the one in the 2nd Edition), can be considered "required" for most of the professional performance engineers working with Linux based systems.

Michael Koltsov

116 reviews70 followers

September 19, 2024

Possibly, that's the most hardcore technical book I've ever read. In times of abstractions, when all things are distributed and it's almost impossible to find where your app is running Brendan Gregg has reinvented his own wheel. He's doing what most engineers are dreaming of doing but neither they have time nor interest from their stakeholders. He's getting to the bottom of what comprises the performance.

Screw your Dockers and Prometheuses, with the few exceptions of few self-written scripts and eBPF one-liners the author is using common tools to measure and troubleshoot the performance problems. This book doesn't go too far as trying to solve all issues with GNU utils only, but almost every tool described in the book is battle-tested and has earned its fame long before this book. Gee, it has full chapters on sar & perf.

Whoa, this book is a refresher in the world of abstractions that we live in now. Not sure, how practical this book is for anyone outside of Netflix but nevertheless it's 100% worth your time if you're interested in how things are working under the hood.

My score 4/5

Jose Luis Prieto Priego

5 reviews

April 21, 2019

I bought and started to read this book to prepare myself for a job interview. Really recommendable.
Ps: I finally got the job :)

Matthias

44 reviews29 followers

July 8, 2021

This is an impressive book, packed with information about methodology and tooling to diagnose performance problems in modern computing environments. The problem is that the presentation of the material is so incredibly dull, in the end I wanted to go read a telephone book instead for pleasure.

The structure is clear and I see what the author was trying to do: provide a repeating pattern of how information is laid out across chapters, but it makes the robotic writing even feel even less appealing. I was also annoyed by Gregg tooting his own horn a little too much for my taste. He wastes no opportunity pointing out which programs presented here exist thanks to him, down to the exact date and circumstances during which he built them. Who cares?! That is useless information to the reader.

I wish the last chapter, the case study, would have appeared in every chapter, presenting real world examples of how to apply all the theory Gregg teaches. In that sense the book feels like a mixture of a university course and a reference: lots of theory and frameworks, but very little time is spent on actually guiding the reader through how to transfer this knowledge to actual problems.

Nevertheless, I can see myself coming back to this book a lot. Again, the university analogy comes to mind: by the time you're taught the subject you have no idea how to apply any of it or when, but several years later you might encounter a problem and you have that light bulb moment in which you remember and dust off this tome and find the advice you need.

If you're looking for a gentle and most of all practical introduction to performance analysis, however, this isn't it.

computers software-development

Simon Eskildsen

215 reviews1,146 followers

February 10, 2020

Great book on debugging production systems. It serves a comprehensive, but simple, mental model for how systems work, and solid methodologies to look at each component. Especially the USE-method: looking at each system component for utilization, saturation, and errors: network, disk, cpu, memory, mutexes, ... Most of the time people use the 'streetlight' method, going through random tools they know. Best illustrated in its absurdity by the parable of the drunk man who was looking for his keys in the dark under the streetlight.

The reason why I can't give the 5th star is because it's focused on currently observable problems. Many of the gnarliest systems performance problems I've encountered happen for a shorter period of time under some hard-to-reproduce condition (where focus is on recovery, not understanding). You then have to dig through metrics _after_ the fact to find out what might have happened. This is often easy for errors, but not for saturation and utilization. Why is there nothing in the book about this?

Mosab

17 reviews32 followers

November 3, 2021

Highly recommended!

2 reviews

January 19, 2020

Great book for practical system performance troubleshooting.

Athanasios

9 reviews

January 21, 2016

Do not let the size daunts you however. Chapters are self-contained, as the author understands that the book might be read under pressure, and contain useful exercises at the end.

What really makes this book stands out, is not the top-notch technical writing or abundance of useful one-liners, is the fact that the author moves forward and suggests a methodology for troubleshooting and performance analysis, as opposed to the ad-hoc methods of the past (or best case scenario a checklist and $DEITY forbid the use of “blame someone else methodology”). In particular the author suggests the USE methodology, USE standing for Utilization – Saturation – Errors, to methodically and accurately analyze and diagnose problems. This methodology (which can be adapted/expanded at will, last time I checked the book was not written in stone), is worth the price of the book alone.

The author correctly maintains that you must have an X-ray (so to speak) of the system at all times. By utilizing tools such as DTrace (available for Solaris and BSD) or the Linux equivalent SystemTap, much insight can be gained from the internals of a system.

Chapters 5-10 are self-explanatory: the author presents what the chapter is about, common errors and common one-liners used to diagnose possible problems. As said before, chapters aim to be self contained and can be read while actually troubleshooting a live system so no lengthy explanations there. At the end of the chapter, the bibliography section provides useful pointers towards resources for further study, something that is greatly appreciated. Finally, the exercises can be easily transformed to interview questions, which is another bonus.

Cloud computing and the special considerations that is presenting is getting its own chapter and the author tries to keep it platform agnostic (even if employed by a “Cloud Computing” company), which is a nice touch. This is followed by a chapter on useful advice on how to actually benchmark systems and the book ends with a, sadly too short, case study.

The appendices that follow should be read, as they contain a lot of useful one-liners (as if the ones in the book were not enough), concrete examples of the USE method, a guide of porting dtrace to systemtap and a who-is-who in the world of systems performance.

So how to sum up the book? “Incredible value” is one thought that comes to mind, “timeless classic” is another. If you are a systems {operator|engineer|administrator|architect}, this book is a must-have and should be kept within reach at all times. Even if your $DAYJOB does not have systems on the title, the book is going to be useful, if you have to interact with Unix-like systems on a frequent basis.

Brandon Antone

16 reviews

January 7, 2017

Great book for any Linux Operations guys out there to test and determine metrics for your infrastructure.

Ferhat Elmas

875 reviews17 followers

July 7, 2019

Good to skim through to learn about what is possible and out there, but it's more like a reference book to check when needed with a specific performance problem.

2018_read

Khang Nguyen

51 reviews73 followers

February 6, 2022

Come in an excellent level of technical details. At the same time, the content can be very dry and not the type of book you want to pick up casually.

Saran Sivashanmugam

34 reviews5 followers

August 13, 2020

Brendan is probably the de-facto authority in the performance world. Brendan walks through the Linus Kernel internals and covers the performance of each areas like Memory, CPU, File Systems, Disks, Networks. His methodologies for analyzing performance problems are must read for SREs and performance engineers. The are plethora of tools that Brendan contributed in creating for Linux performance troubleshooting. I love the easy to follow and structured approach of Brendan's writing. Specifically the USE methodology, drill-down methodology, the block diagram with tools should be at every desk of SREs, production engineers and performance engineers.

My only concern is that I'm too late to pick this first edition. BPF tools are not covered and some contents are outdated for this point in time.

If you want to read this, please wait till November 2020 till the book hit the stands.

Colin

270 reviews9 followers

Read

September 6, 2025

Man, I don't know what to rate this. This book is an extremely comprehensive overview of the different areas of a computer system that might affect its performance, and how those different areas can be analysed. Gregg writes well, but the subject matter is intrinsically very dry. I hope I have retained enough to at least be able to refer to this when the relevant areas crop up. Some of this talks about tooling that I was already aware of, but hopefully I will be more ready to reach for it as appropriate.

Gregg is kind of the guy for performance analysis and engineering, and he is not at all worried about attributing himself where appropriate. As a Brit, I find this slightly annoying, but I appreciate that isn't how you get promoted.

Adelbert

62 reviews3 followers

February 23, 2020

Very very well written book. I didn't actually read it front to back, I read the first 4 chapters which covers the foundation, chapter 5 which covers application-level performance, and the last 3 chapters on cloud and multi-tenant performance, benchmarking, and a case study. The middle chapters dive into other specific topics like CPU, memory, file systems, etc. that I will reference on an as-needed basis.

Overall very well written, communicates concepts clearly, and reifies a lot of things that I often see go unnoticed or underappreciated. A book worth keeping on the bookshelf even after being read.

Paul

87 reviews3 followers

December 27, 2019

Absolutely amazing book on performance measurement. Contains a lot of theory how to measure performance (starting from "what performance really is" - and it is not so obvious) to example how to drill down. This books contains a lot of practical examples on performance issues investigation. Looks slightly outdated (tap, solaris, DTrace) but it is really worth reading for admins and every person who cares about performance.

David

17 reviews3 followers

November 16, 2016

Though at risk of being a tad ranty about how Solaris is better than linux, Brendan Gregg's detail and understanding of Kernel development and performance is comprehensive and both introduces the topic and then guides the reader through how to measure it. It's a must-read for Linux developers.

Hadi

9 reviews11 followers

October 29, 2017

The book addresses two different questions:
1. How to analyze systems performance? (Methodology)
2. How to use tools to achieve these goals? (Tools)

The first question is thoroughly answered in the book. However, the second question is pretty outdated for Linux.

Ernestas Poskus

189 reviews8 followers

August 10, 2017

Grail of performance issues/improvements.

Ankit

9 reviews

December 17, 2017

Brings down performance analysis to actual commands and methodologies you can directly use. Quite nice read.

Ivan

223 reviews11 followers

December 31, 2018

Внушительный справочник по перфомансу, читать один раз явно недостаточно.

Dmytro Sirenko

5 reviews4 followers

January 2, 2020

An in-depth look at modern Linux introspection, functioning, tools and metrics.

The USE method is very helpful for an SRE.

Zach

206 reviews

June 28, 2020

When investigating a CPU issue, I found that I had to dive much deeper than the details provided by this book. It's a good starting point though. Brendan Gregg is a legend.