Jump to ratings and reviews
Rate this book

Systems Performance: Enterprise and the Cloud

Rate this book
The Complete Guide to Optimizing Systems Performance Written by the winner of the 2013 LISA Award for Outstanding Achievement in System Administration
Large-scale enterprise, cloud, and virtualized computing systems have introduced serious performance challenges. Now, internationally renowned performance expert Brendan Gregg has brought together proven methodologies, tools, and metrics for analyzing and tuning even the most complex environments. Systems Performance: Enterprise and the Cloud focuses on Linux(R) and Unix(R) performance, while illuminating performance issues that are relevant to all operating systems. You'll gain deep insight into how systems work and perform, and learn methodologies for analyzing and improving system and application performance. Gregg presents examples from bare-metal systems and virtualized cloud tenants running Linux-based Ubuntu(R), Fedora(R), CentOS, and the illumos-based Joyent(R) SmartOS(TM) and OmniTI OmniOS(R). He systematically covers modern systems performance, including the "traditional" analysis of CPUs, memory, disks, and networks, and new areas including cloud computing and dynamic tracing. This book also helps you identify and fix the "unknown unknowns" of complex performance: bottlenecks that emerge from elements and interactions you were not aware of. The text concludes with a detailed case study, showing how a real cloud customer issue was analyzed from start to finish. Coverage includes - Modern performance analysis and tuning: terminology, concepts, models, methods, and techniques - Dynamic tracing techniques and tools, including examples of DTrace, SystemTap, and perf - Kernel internals: uncovering what the OS is doing - Using system observability tools, interfaces, and frameworks - Understanding and monitoring application performance - Optimizing CPUs: processors, cores, hardware threads, caches, interconnects, and kernel scheduling - Memory optimization: virtual memory, paging, swapping, memory architectures, busses, address spaces, and allocators - File system I/O, including caching - Storage devices/controllers, disk I/O workloads, RAID, and kernel I/O - Network-related performance issues: protocols, sockets, interfaces, and physical connections - Performance implications of OS and hardware-based virtualization, and new issues encountered with cloud computing - Benchmarking: getting accurate results and avoiding common mistakes This guide is indispensable for anyone who operates enterprise or cloud environments: system, network, database, and web admins; developers; and other professionals. For students and others new to optimization, it also provides exercises reflecting Gregg's extensive instructional experience.

735 pages, Paperback

First published September 27, 2013

Loading interface...
Loading interface...

About the author

Brendan Gregg

4 books34 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
248 (62%)
4 stars
105 (26%)
3 stars
35 (8%)
2 stars
3 (<1%)
1 star
3 (<1%)
Displaying 1 - 30 of 41 reviews
Profile Image for Yevgeniy Brikman.
Author 3 books582 followers
January 11, 2017
This isn't a book, so much as it is a reference manual or an appendix. It's nearly 800 pages of dense, low-level discussions of performance issues related to the CPU, memory, hard drive, OS, and so on. The writing is very structured, repetitive, and dry and resembles a list of facts more than prose. If you have a specific performance issue and need to know how to, say, use DTrace to diagnose an issue with a memory leak, this book is perfect. If you're looking for something you can read cover to cover to generally improve your understanding of system performance, this book probably isn't it.

If you are going to read this, I recommend reading the first few sections of each chapter, which typically have a nice introduction to the architecture of the CPU, memory, etc. They are also full of handy tables, such as typical, real-world latencies and typical performance trade-offs to consider (e.g. cpu vs memory, small vs large record sizes). The remainder of each chapter is a deep-dive into specific performance tools you can use, which is handy as a reference, but does not make for interesting reading otherwise, as there is no way you can retain so much detailed info. I'd also mention that since the author is a Solaris expert and creator of DTrace, you will see a lot of information about both in every single chapter.

The final chapter of the book is great: it walks through a real-world case study and shows how to use various techniques to analyze it and the thought process that goes into tracking down performance bottlenecks. Seeing such a case study gives you a much better sense for the context in which the various performance tools should be used and some awareness of whether the data returned by those tools is normal or not. This would have been a much better book if every chapter had been primarily focused on such case studies, with all the other nitty gritty details tacked on solely as supporting information (perhaps in an appendix!).
Profile Image for Emre Sevinç.
143 reviews269 followers
November 25, 2020
Whenever I watch Netflix movies from different locations, on different networks, and via different devices without any serious performance problems, I think about this book. I just can't help it. Guess why?

Brendan Gregg, also known as The Server Whisperer, among other things, wrote one of the most pragmatic and comprehensive books on Unix and Linux performance engineering. If you're in any way seriously involved with benchmarking, tuning, and analyzing GNU/Linux or Unix based systems in the last 10 years, you've either come across some of the tools developed by the author such as DTraceToolkit, or used concepts and techniques developed by him such as USE Method, and Flame Graphs.

I'll cut it short, and try to answer a simple question: Should you buy and study this book?

Yes, and no.

Yes, in the sense that it has solid and very-well laid out & presented material on the fundamental aspects of performance engineering of most of the underlying components and layers such as kernel, file systems, disks, memory, networking, etc. On top of that, the principles clearly explained and exemplified by Gregg are timeless; e.g. queuing theory, benchmarking pitfalls, workload characterization, checklists, visualization techniques, etc. (The book also provides nice historical context which is a plus in my book, but that's just me.) Moreover, the questions at the end of chapter can be considered very nice additions: some of straightforward technical questions will help you fill your technical the gaps, whereas some of the more open ended ones can be considered a very good practice material to stretch your analytic thinking capabilities.

No, in the sense that, a period of 6 years is like a lifetime in this industry, and technology is a moving target, GNU/Linux doubly so! From 2021 and onward, I don't think majority of the readers of this review will need any of the Solaris-related practical and historical stuff in the book (it is great, but not for everyone I guess). Moreover, it'll be good to have a more up-to-date resources for finer details of practical performance engineering tools for GNU/Linux systems, as well as heavier focus on cloud computing related tips, tricks, and pitfalls when it comes to analyzing and tuning performance.

I've already bought this book some time ago, but I'll definitely buy the 2nd Edition from the same author (see the official book web site for more information).

Long story short, most of the body of knowledge in this book (and most probably the one in the 2nd Edition), can be considered "required" for most of the professional performance engineers working with Linux based systems.
Profile Image for Simon Eskildsen.
215 reviews938 followers
February 10, 2020
Great book on debugging production systems. It serves a comprehensive, but simple, mental model for how systems work, and solid methodologies to look at each component. Especially the USE-method: looking at each system component for utilization, saturation, and errors: network, disk, cpu, memory, mutexes, ... Most of the time people use the 'streetlight' method, going through random tools they know. Best illustrated in its absurdity by the parable of the drunk man who was looking for his keys in the dark under the streetlight.

The reason why I can't give the 5th star is because it's focused on currently observable problems. Many of the gnarliest systems performance problems I've encountered happen for a shorter period of time under some hard-to-reproduce condition (where focus is on recovery, not understanding). You then have to dig through metrics _after_ the fact to find out what might have happened. This is often easy for errors, but not for saturation and utilization. Why is there nothing in the book about this?
40 reviews30 followers
July 8, 2021
This is an impressive book, packed with information about methodology and tooling to diagnose performance problems in modern computing environments. The problem is that the presentation of the material is so incredibly dull, in the end I wanted to go read a telephone book instead for pleasure.

The structure is clear and I see what the author was trying to do: provide a repeating pattern of how information is laid out across chapters, but it makes the robotic writing even feel even less appealing. I was also annoyed by Gregg tooting his own horn a little too much for my taste. He wastes no opportunity pointing out which programs presented here exist thanks to him, down to the exact date and circumstances during which he built them. Who cares?! That is useless information to the reader.

I wish the last chapter, the case study, would have appeared in every chapter, presenting real world examples of how to apply all the theory Gregg teaches. In that sense the book feels like a mixture of a university course and a reference: lots of theory and frameworks, but very little time is spent on actually guiding the reader through how to transfer this knowledge to actual problems.

Nevertheless, I can see myself coming back to this book a lot. Again, the university analogy comes to mind: by the time you're taught the subject you have no idea how to apply any of it or when, but several years later you might encounter a problem and you have that light bulb moment in which you remember and dust off this tome and find the advice you need.

If you're looking for a gentle and most of all practical introduction to performance analysis, however, this isn't it.
Profile Image for ST.
2 reviews
January 19, 2020
Great book for practical system performance troubleshooting.
Profile Image for Athanasios.
9 reviews
January 21, 2016
Do not let the size daunts you however. Chapters are self-contained, as the author understands that the book might be read under pressure, and contain useful exercises at the end.

What really makes this book stands out, is not the top-notch technical writing or abundance of useful one-liners, is the fact that the author moves forward and suggests a methodology for troubleshooting and performance analysis, as opposed to the ad-hoc methods of the past (or best case scenario a checklist and $DEITY forbid the use of “blame someone else methodology”). In particular the author suggests the USE methodology, USE standing for Utilization – Saturation – Errors, to methodically and accurately analyze and diagnose problems. This methodology (which can be adapted/expanded at will, last time I checked the book was not written in stone), is worth the price of the book alone.

The author correctly maintains that you must have an X-ray (so to speak) of the system at all times. By utilizing tools such as DTrace (available for Solaris and BSD) or the Linux equivalent SystemTap, much insight can be gained from the internals of a system.

Chapters 5-10 are self-explanatory: the author presents what the chapter is about, common errors and common one-liners used to diagnose possible problems. As said before, chapters aim to be self contained and can be read while actually troubleshooting a live system so no lengthy explanations there. At the end of the chapter, the bibliography section provides useful pointers towards resources for further study, something that is greatly appreciated. Finally, the exercises can be easily transformed to interview questions, which is another bonus.

Cloud computing and the special considerations that is presenting is getting its own chapter and the author tries to keep it platform agnostic (even if employed by a “Cloud Computing” company), which is a nice touch. This is followed by a chapter on useful advice on how to actually benchmark systems and the book ends with a, sadly too short, case study.

The appendices that follow should be read, as they contain a lot of useful one-liners (as if the ones in the book were not enough), concrete examples of the USE method, a guide of porting dtrace to systemtap and a who-is-who in the world of systems performance.

So how to sum up the book? “Incredible value” is one thought that comes to mind, “timeless classic” is another. If you are a systems {operator|engineer|administrator|architect}, this book is a must-have and should be kept within reach at all times. Even if your $DAYJOB does not have systems on the title, the book is going to be useful, if you have to interact with Unix-like systems on a frequent basis.
Profile Image for Brandon Antone.
14 reviews
January 7, 2017
Great book for any Linux Operations guys out there to test and determine metrics for your infrastructure.
April 21, 2019
I bought and started to read this book to prepare myself for a job interview. Really recommendable.
Ps: I finally got the job :)
Profile Image for Khang Nguyen.
47 reviews70 followers
February 6, 2022
Come in an excellent level of technical details. At the same time, the content can be very dry and not the type of book you want to pick up casually.
Profile Image for Saran Sivashanmugam.
34 reviews4 followers
August 13, 2020
Brendan is probably the de-facto authority in the performance world. Brendan walks through the Linus Kernel internals and covers the performance of each areas like Memory, CPU, File Systems, Disks, Networks. His methodologies for analyzing performance problems are must read for SREs and performance engineers. The are plethora of tools that Brendan contributed in creating for Linux performance troubleshooting. I love the easy to follow and structured approach of Brendan's writing. Specifically the USE methodology, drill-down methodology, the block diagram with tools should be at every desk of SREs, production engineers and performance engineers.

My only concern is that I'm too late to pick this first edition. BPF tools are not covered and some contents are outdated for this point in time.

If you want to read this, please wait till November 2020 till the book hit the stands.
Profile Image for Adelbert.
43 reviews3 followers
February 23, 2020
Very very well written book. I didn't actually read it front to back, I read the first 4 chapters which covers the foundation, chapter 5 which covers application-level performance, and the last 3 chapters on cloud and multi-tenant performance, benchmarking, and a case study. The middle chapters dive into other specific topics like CPU, memory, file systems, etc. that I will reference on an as-needed basis.

Overall very well written, communicates concepts clearly, and reifies a lot of things that I often see go unnoticed or underappreciated. A book worth keeping on the bookshelf even after being read.
Profile Image for Paul.
50 reviews2 followers
December 27, 2019
Absolutely amazing book on performance measurement. Contains a lot of theory how to measure performance (starting from "what performance really is" - and it is not so obvious) to example how to drill down. This books contains a lot of practical examples on performance issues investigation. Looks slightly outdated (tap, solaris, DTrace) but it is really worth reading for admins and every person who cares about performance.
Profile Image for David.
17 reviews2 followers
November 16, 2016
Though at risk of being a tad ranty about how Solaris is better than linux, Brendan Gregg's detail and understanding of Kernel development and performance is comprehensive and both introduces the topic and then guides the reader through how to measure it. It's a must-read for Linux developers.
Profile Image for Hadi.
9 reviews11 followers
October 29, 2017
The book addresses two different questions:
1. How to analyze systems performance? (Methodology)
2. How to use tools to achieve these goals? (Tools)

The first question is thoroughly answered in the book. However, the second question is pretty outdated for Linux.
9 reviews
December 17, 2017
Brings down performance analysis to actual commands and methodologies you can directly use. Quite nice read.
Profile Image for Ivan.
221 reviews9 followers
December 31, 2018
Внушительный справочник по перфомансу, читать один раз явно недостаточно.
Profile Image for Ferhat Elmas.
596 reviews6 followers
July 7, 2019
Good to skim through to learn about what is possible and out there, but it's more like a reference book to check when needed with a specific performance problem.
Profile Image for Dmytro Sirenko.
5 reviews2 followers
January 2, 2020
An in-depth look at modern Linux introspection, functioning, tools and metrics.

The USE method is very helpful for an SRE.
Profile Image for Zach.
189 reviews
June 28, 2020
When investigating a CPU issue, I found that I had to dive much deeper than the details provided by this book. It's a good starting point though. Brendan Gregg is a legend.
Profile Image for Kostua.
2 reviews
February 3, 2021
A little bit outdated. I strongly recommend 2nd edition issued in 2021. Overall excellent book.
Profile Image for huydx.
33 reviews12 followers
February 22, 2017
One of my most favourite everything-reference when I need to do system benchmark/trouble-shooting. This book covers almost all aspects of low-level stuff from kernel to network-protocol, or file system, disk system. Highly recommended.
Profile Image for Terry.
104 reviews3 followers
February 10, 2016
This book is nightmarishly good. I figured it would take months to slog through it, but was able to in 4 months. I was a bit nutty, since I took handwritten notes.

Book I wanted to have when I first started my career. All the stuff in OS classes and things gleaned from experience wrapped up in one, with exercises at chapter's end to test your knowledge, well-written text, graphics that accurately capture and describe topics and a mild amount of human humour injected where appropriate.

This is a HARD long read if you are totally new to this space. By the end of it, you won't know everything, but you will have a general idea and know where to look.

Think book suffers from being just so long. Also, you will miss out if you do not have Dtrace installed and functioning properly + kernel symbols enabled to permit you to follow some of the more detailed tracing/system call demonstrations.

I want to give it a 4.5 for not being totally Linux-centric, but this is "Systems Performance," so that agnosticism is wrong on my part. The length is pretty killer as well, so the book can double as explainer and reference. Some folks may want to just skim, find some magic commands, and go from there. That may be enough to survive but not to "thrive" so I won't hate on that.

Buy the book, fight through from start to finish, feel confident afterwards with your next perf challenge.
Profile Image for Franck Chauvel.
119 reviews4 followers
June 24, 2016
This book details how to approach software performance issues. It explains how to observe, measure and visualise what is happening in the OS and beyond (only on the Linux and Solaris platforms). I don't think the book reads very well from cover to cover, but I did devour the first chapters that explain the concepts and methodologies, as well as the final the case study. To my opinion, the rest seems more of a reference to consult when faced with a particular performance issue as they covers CPU, memory, network, disk, etc and detail how to use the available tooling.
Displaying 1 - 30 of 41 reviews

Can't find what you're looking for?

Get help and learn more about the design.