matagus’s Kindle Notes & Highlights for An Elegant Puzzle: Systems of Engineering Management

Rate it:

Open Preview

More on this book

Community

Dan Kuida

1 note & 59 highlights

ian yang

43 notes & 115 highlights

Vladimir

1 note & 1 highlight

Bhaskar Chowdhury

18 notes & 22 highlights

Rob

8 notes & 44 highlights

Andrii Sherepa

3 notes & 26 highlights

Max Wolffe

4 notes & 73 highlights

Siarhei Vasilyeu

3 notes & 43 highlights

Jake McCrary

1 note & 102 highlights

Simão Freitas

3 notes & 15 highlights

James Bowkett

Jake Losh

Joshua Silva

Jason

Ankit Bhagat

Jiří

Corey

Eric

Zach

Beto de Castro Moreira

Mindaugas Mozūras

marcos b. siqueira

Angie Wood

Douglas

Léo

Andrew Turner

Bassam Ismail

Billie

Konstantin Anthony Romanov Ⅰ

Graham

Sujith

mathiasx

Tomáš

Dale Alleshouse

Nikita

Marc Roberts

Tien Nguyen Van

Kindle Notes & Highlights

by matagus

See all matagus’s Notes & Highlights

An Elegant Puzzle: Systems of Engineering Management

by Will Larson

Read between September 13 - September 16, 2019

Organizational design gets the right people in the right places, empowers them to make decisions, and then holds them accountable for their results.

When I have a problem that I want to solve quickly and cheaply, I start thinking about process design. A problem I want to solve permanently and we have time to go slow? That’s a good time to evolve your culture. However, if process is too weak a force, and culture too slow, then organizational design lives between those two.

An important property of teams is that they abstract the complexities of the individuals that compose them. Teams with fewer than four individuals are a sufficiently leaky abstraction that they function indistinguishably from individuals.

Keep innovation and maintenance together. A frequent practice is to spin up a new team to innovate while existing teams are bogged down in maintenance. I’ve historically done this myself, but I’ve moved toward innovating within existing teams.5 This requires very deliberate decision-making and some bravery, but in exchange you’ll get higher morale and a culture of learning, and will avoid creating a two-tiered class system of innovators and maintainers.

Fitting together those guiding principles, the playbook that I’ve developed is surprisingly simple and effective: Teams should be six to eight during steady state. To create a new team, grow an existing team to eight to ten, and then bud into two teams of four or five. Never create empty teams. Never leave managers supporting more than eight individuals.

A team is treading water if they’re able to get their critical work done, but are not able to start paying down technical debt or begin major new projects. Morale is a bit higher, but people are still working hard, and your users may seem happier because they’ve learned that asking for help won’t go anywhere.

A team is innovating when their technical debt is sustainably low, morale is high, and the majority of work is satisfying new user needs.

I can’t stress enough that these fixes are slow. This is because systems accumulate months or years of static, and you have to drain that all away. Conversely, the same properties that make these fixes slow to fix make them extremely durable once in effect!

Adding new individuals to a team disrupts that team’s gelling process, so I’ve found it much easier to have rapid growth periods for any given team, followed by consolidation/gelling periods during which the team gels. The organization will never stop growing, but each team will.

After I wrote “Staying on the Path to High-Performance Teams,”8 quite a few people asked the same follow-up question: “Once a team has repaid its technical debt, shouldn’t the now surplus team members move to other teams?” This makes a lot of sense, because the team, with so little technical debt left, is now overstaffed relative to its global priority. Repeated across many teams, this could lead to an organization having far too many engineers allocated against last year’s problems, and too few against today’s. This is an important problem to address!

Fundamentally, I believe that sustained productivity comes from high-performing teams, and that disassembling a high-performing team leads to a significant loss of productivity, even if the members are fully retained. In this worldview, high-performing teams are sacred, and I’m quite hesitant to disassemble them.

This is part of why my proposed model9 recommends rapidly hiring into teams loaded down by technical debt, not into innovating teams, which avoids incurring re-gelling costs on high-performing teams.

My rule of thumb is that it takes eight engineers on a team to support a two-tier on-call rotation, so I’m generally reluctant to move any team with membership below that line. However, fixed costs come in many other varieties: “keeping the lights on” work, precommitted contracts, support questions from other teams, etc.

The other approach that I’ve seen work well is to rotate individuals for a fixed period into an area that needs help. The fixed duration allows them to retain their identity and membership in their current team, giving their full focus to helping out, rather than splitting their focus between performing the work and finding membership in the new team.

10%

Productively integrating large numbers of engineers is hard. Just how challenging this is depends on how quickly you can ramp engineers up to self-sufficient productivity, but if you’re doubling every six months and it takes six to twelve months to ramp up, then you can quickly find a scenario in which untrained engineers increasingly outnumber the trained engineers, and each trained engineer is devoting much of their time to training a couple of newer engineers.

11%

If your company is designing systems to last one order of magnitude and is doubling every six months, then you’ll have to re-implement every system twice every three years. This creates a great deal of risk—almost every platform team is working on a critical scaling project—and can also create a great deal of resource contention to finish these concurrent rewrites.

11%

However, the real productivity killer is not system rewrites but the migrations that follow those rewrites. Poorly designed migrations expand the consequences of this rewrite loop from the individual teams supporting the systems to the entire surrounding organization.

11%

My favorite observation from The Phoenix Project by Gene Kim, Kevin Behr, and George Spafford15 is that you only get value from projects when they finish: to make progress, above all else, you must ensure that some of your projects finish.

11%

When your company has decided that it is going to grow, you cannot stop it from growing, but, on the other hand, you absolutely can concentrate that growth, such that your teams alternate between periods of rapid hiring and periods of consolidation and gelling.

12%

Finally, the one thing that I’ve found at companies with very few interruptions and have observed almost nowhere else: really great, consistently available documentation. It’s probably even harder to bootstrap documentation into a non-documenting company than it is to bootstrap unit tests into a non-testing company, but the best solution to frequent interruptions I’ve seen is a culture of documentation, documentation reading, and a documentation search that actually works.

12%

Along these lines, if you can keep your interfaces generic, then you are able to skip the migration phase of system re-implementation, which tends to be the longest and trickiest phase, and you can iterate much more quickly and maintain fewer concurrent versions.

12%

None of the ideas here are instant wins. It’s my sense that managing rapid growth is more along the lines of stacking small wins than identifying silver bullets.

13%

Lately, I’m increasingly hearing folks reference the idea of organizational debt. This is the organizational sibling of technical debt, and it represents things like biased interview processes and inequitable compensation mechanisms.

13%

These problems bubble up from your peers, skip-level one-on-ones,16 and organizational health surveys. If you care and are listening, these are hard to miss. But they are slow to fix. And, oh, do they accumulate! The larger and older your organization is, the more you’ll find perched on your capable shoulders. How you respond to this is, in my opinion, the core challenge of leading a large organization.

13%

Typically, my organizational philosophy is to stabilize team-by-team and organization-by-organization. Ensuring any given area is well on the path to health before moving my focus. I try not to push risks onto teams that are functioning well. You do need to delegate some risks, but generally I think it’s best to only delegate solvable risk. If something simply isn’t likely to go well, I think it’s best to hold the bag yourself. You may be the best suited to manage the risk, but you’re almost certainly the best positioned to take responsibility.

13%

Two or three years into a role, you may find that your personal rate of learning has trailed off. You know your team well, the industry particulars are no longer quite as intimidating, and you have solved the mystery of getting things done at your company. This can be a sign to start looking for your next role, but it’s also a great opportunity to build experience with succession planning.

14%

Take a look at your calendar and write down your role in meetings. This goes for explicit roles, like owning a meeting’s agenda, and also for more nuanced roles, like being the first person to champion others’ ideas, or the person who is diplomatic enough to raise difficult concerns.

14%

Take a second pass on your calendar for non-meeting stuff, like interviewing and closing candidates.

14%

Look back over the past six months for recurring processes, like roadmap planning, performance calibrations, or head count decisions, and documen...