Attila Bertók ’s Kindle Notes & Highlights for The DevOps Handbook: How to Create World-Class Agility, Reliability, & Security in Technology Organizations

The DevOps Handbook: How to Create World-Class Agility, Reliability, & Security in Technology Organizations

Rate it:

Open Preview

More on this book

Community

Jonathon Conley

13 notes & 153 highlights

Keith

3 notes & 29 highlights

Keith

3 notes & 36 highlights

Koushal

1 note & 31 highlights

Kevin

1 note & 4 highlights

Rodney Roberts

Luciana Ferreyra

Vanwhelan

Dominik Sigmund

Jonathon Conley

Kami

Gustavo Antonio Parada Sarmiento

Duri Chitayat

Dean Sas

Lionel Orellana

Damien Ryan

Daniel Banasiak

Adrian Bordinc

Ketil Moland

Angel Garbarino

Sam Gaulding

Vipin Ajayakumar

Gareth Oates

Charles-Henri Lison

Russ Sanderlin

Dion BROWN

Stanisław Tuszyński

Donnie Berkholz

antoine pecatikov

Dan Lastoria

Cameron Wolff

Matt Chave

Richard Murphy

Michael Hansen

Todd

Karl

Chris

Kindle Notes & Highlights

by Attila Bertók

See all Attila’s Notes & Highlights

The DevOps Handbook: How to Create World-Class Agility, Reliability, & Security in Technology Organizations

by Gene Kim

Read between February 27 - March 28, 2023

44%

Note that in addition to monitoring our production services, we also need telemetry for those services in our pre-production environments (e.g., development, test, staging, etc.). Doing this enables us to find and fix issues before they go into production, such as detecting when we have ever-increasing database insert times due to a missing table index.

45%

Netflix “used outlier detection in a very simple way, which was to first compute what was the ‘current normal’ right now, given the population of nodes in a compute cluster. And then we identified which nodes didn’t fit that pattern, and removed those nodes from production.

45%

exercise. One of the easiest ways to do this is to analyze our most severe incidents in the recent past (e.g., thirty days) and create a list of telemetry that could have enabled earlier and faster detection and diagnosis of the problem, as well as easier and faster confirmation that an effective fix had been implemented.

47%

ensure that we are actively monitoring our production telemetry when anyone performs a production deployment,

47%

have Development groups self-manage their services in production to prove they are stable before they become eligible for an SRE (site reliability engineering) team to manage. By having developers be responsible for deployment and production support, we are far more likely to have a smooth transition to Operations.

49%

“the most inefficient way to test a business model or product idea is to build the complete product to see whether the predicted demand actually exists.”

49%

we may conduct an experiment to see whether modifying the text or color on a “buy” button increases revenue or whether slowing down the response time of a website (by introducing an artificial delay as the treatment) reduces revenue.

49%

If we are not performing user research, the odds are that two-thirds of the features we are building deliver zero or negative value to our organization, even as they make our codebase ever more complex, thus increasing our maintenance costs over time and making our software more difficult to change.

49%

we can frame hypotheses in feature development in the following form:12 •We believe increasing the size of hotel images on the booking page •Will result in improved customer engagement and conversion •We will have confidence to proceed when we see a 5% increase in customers who review hotel images who then proceed to book in forty-eight hours.

51%

The principle of small batch sizes also applies to code reviews. The larger the size of the change that needs to be reviewed, the longer it takes to understand and the larger the burden on the reviewing engineer.

51%

•If someone submits a change that is too large to reason about easily—in other words, you can’t understand its impact after reading through it a couple of times, or you need to ask the submitter for clarification—it should be split up into multiple, smaller changes that can be understood at a glance.

52%

when asked to describe a great pull request that indicates an effective review process, Tomayko quickly listed off the essential elements: there must be sufficient detail on why the change is being made, how the change was made, as well as any identified risks and resulting countermeasures.

54%

If accidents are not caused by “bad apples” but rather are due to inevitable design problems in the complex system that we created, then instead of “naming, blaming, and shaming” the person who caused the failure, our goal should always be to maximize opportunities for organizational learning, continually reinforcing that we value actions that expose and share more widely the problems in our daily work. This is what enables us to improve the quality and safety of the system we operate within and reinforce the relationships between everyone who operates within that system.

57%

coordination.15 We put into our shared source code repository not only source code but also other artifacts that encode knowledge and learning, including: •configuration standards for our libraries, infrastructure, and environments (Chef, Puppet, or Ansible scripts) •deployment tools •testing standards and tools, including security •deployment pipeline tools •monitoring and analysis tools •tutorials and standards

57%

Examples of non-functional requirements include ensuring that we have: •sufficient production telemetry in our applications and environments •the ability to accurately track dependencies •services that are resilient and degrade gracefully •forward and backward compatibility between versions •the ability to archive data to manage the size of the production data set •the ability to easily search and understand log messages across services •the ability to trace requests from users through multiple services •simple, centralized runtime configuration using feature flags, etc.

58%

the five essential characteristics of cloud computing defined by the US Federal Government’s National Institute of Standards and Technology (NIST):20 •On-demand self service: Consumers can automatically provision computing resources as needed, without human interaction from the provider. •Broad network access: Capabilities can be accessed through heterogeneous platforms, such as mobile phones, tablets, laptops, and workstations. •Resource pooling: Provider resources are pooled in a multi-tenant model, with physical and virtual resources dynamically assigned on demand. The customer may specify ...more

59%

etc. One of the easiest ways to do this is to schedule and conduct day- or week-long improvement blitzes, where everyone on a team (or in the entire organization) self-organizes to fix problems they care about—no feature work is allowed.

63%

The 2019 report also showed that when analyzing software components, the time required for those projects to remediate their security vulnerabilities (TTR) was correlated with the time required to update any of their dependencies (TTU).23 In other words, projects that update more frequently tend to remediate their security vulnerabilities faster. This fact motivates why Jeremy Long, founder of the OWASP Dependency Check project, suggests that the best security patching strategy is to remain current on all dependencies.

63%

The 2019 study also found that the “popularity” of a software project (e.g., number of GitHub stars or forks or the number of Maven Central downloads) is not correlated with better security characteristics.

68%

Innovation is impossible without risk-taking, and if you haven’t managed to upset at least some people in management, you’re probably not trying hard enough. Don’t let your organization’s immune system deter or distract you from your vision.

70%

any countermeasures must be assigned to someone, and if the corrective action does not warrant being a top priority when the meeting is over, then it is not a corrective action. (This is to prevent the meeting from generating a list of good ideas that are never implemented.)

« Prev 1 2 3 Next »

See a Problem?

Preview — The DevOps Handbook by Gene Kim