Robert’s Kindle Notes & Highlights for Software Engineering at Google: Lessons Learned from Programming Over Time

Rate it:

Open Preview

More on this book

Community

Pedro Francisco Suarez

Daniel Pilon

Mustafa Berkay Mutlu

Yulia

Mauricio Chirino

Jeff Thomas

Gerardo Ortega

Dylan Meeus

Kindle Notes & Highlights

by Robert

See all Robert’s Notes & Highlights

Software Engineering at Google: Lessons Learned from Programming Over Time

by Titus Winters

Read between June 2, 2020 - March 6, 2025

The great thing about tech is that there is never only one way to do something. Instead, there is a series of trade-offs we all must make depending on the circumstances of our team and situation.

Those who know me hear me say too often: "There is no right or wrong, just trade-offs" (a variant of "There is no right or wrong, just consequences") 😀

When you are fundamentally incapable of reacting to a change in underlying technology or product direction, you’re placing a high-risk bet on the hope that such a change never becomes critical. For short-term projects, that might be a safe bet. Over multiple decades, it probably isn’t.

One helpful thing you can do (I'm sure this will be mentioned later), is to understand which aspects will have a short lifetime (or shorter, at least) and which will have a long, or a which aspects are easier to change and which are hard. "IADA," something I learned many years back, is one such framework. It states that, in general, Identifiers are the hardest to change, followed by API, then Data model/storage, then Architecture (i.e., changing implementation is the easiest). So, that can guide how much effort to spend on future-proofing things (in decision, design, or implementation). i.e., it's more important to spend time ensuring your get identifiers and APIs right than it is to get data storage right.

On the other end of the spectrum, some successful projects have an effectively unbounded life span: we can’t reasonably predict an endpoint for Google Search,

I don't get why they're saying that "mobile apps have a fairly short life span" and search does not. I've now worked on two apps that have a ~10 year life span. And it's easier to replace backend software; 1) it's harder to replace mobile apps (you don't have as much control with what goes on on the client-side, when updates will happen, etc.) and 2) mobile apps are more likely to have to make you support APIs longer (because people may be slow to upgrade or even stop upgrading altogether).

Given enough time and enough users, even the most innocuous change will break something;9 your analysis of the value of that change must incorporate the difficulty in investigating, identifying, and resolving those breakages. Example: Hash Ordering

A real world example that I've encountered: database/object IDs; assumed monotonically increasing and used for time-based sorting 😔.

“It’s programming if ‘clever’ is a compliment, but it’s software engineering if ‘clever’ is an accusation.”

I know it's a facetious statement, but I still don't think I like it. You can have cleverness in design (before getting to the programming part) that's part of good software engineering.

Hidden in the discussion of capability is also one of costs: if changing something comes at inordinate cost, it will likely be deferred.

Like this?😂 https://www.youtube.com/watch?v=y8OnoxKotPQ

Only with an organization-wide awareness and commitment to scaling are you likely to keep on top of these issues.

I think it very often actually starts at the bottom: it started with an enterprising individual or few who's fed up with a problem, begins to develop a solution, some people notice, it develops momentum, _then_ the organization notices and begins to support its scaling and adoption.

In 2012, we tried to put a stop to this with rules mitigating churn: infrastructure teams must do the work to move their internal users to new versions themselves or do the update in place, in backward-compatible fashion. This policy, which we’ve called the “Churn Rule,” scales better: dependent projects are no longer spending progressively greater effort just to keep up. We’ve also learned that having a dedicated group of experts execute the change scales better than asking for more maintenance effort from every user: experts spend some time learning the whole problem in depth and then apply ...more

I’m curious how much the things claimed as “company-wide policies” — or even just “Google practices” — are in fact company-wide.

An organization might identify that merging large features into trunk has destabilized the product and conclude, “We need tighter controls on when things merge. We should merge less frequently.” This leads quickly to every team or every feature having separate dev branches. Whenever any branch is decided to be “complete,” it is tested and merged into trunk, triggering some potentially expensive work for other engineers still working on their dev branch, in the form of resyncing and testing.

In decentralized cases, this often ends up as a persistent fork. An example I know about: A "branch" was created for the TV version of an app, and at some point the task of merging back with the phone app was abandoned due to the cost of doing so. So, the TV and phone apps just stayed different fragmented apps — in features and user experience as well as code — and any effort to consolidate always ended up at the same conclusion: it's too much work.

From a scaling perspective, the Beyoncé Rule implies that complicated, one-off bespoke tests that aren’t triggered by our common CI system do not count. Without this, an engineer on an infrastructure team could conceivably need to track down every team with any affected code and ask them how to run their tests.

There are unmentioned / hidden caveats here: 1. All tests affected by the infra change must be run. That scales super-linearly. So, then you have to set up a system to determine what to test depending on what changed. That's an investment/cost to consider. 2. Is that system right often enough? You/Google may get it right 99% of the time, but what about the other 1%? This can result in more user-facing breakages (not protected by tests). Is that okay (a trade-off you're okay with)? There are many cases where they are not, often products that users or companies pay for, software that's distributed and harder to quickly deploy fixes for, etc..

We’ve found that expertise and shared communication forums offer great value as an organization scales. As engineers discuss and answer questions in shared forums, knowledge tends to spread.

This is a bi of non-sequitur; seems like something they wanted to mention and this is the best place they could find to do that (even if it didn't really fit). They mention it as though its unique. Is there any modern software-based company that doesn't have this?

This story isn’t at all unusual. Engineers at many companies can tell a similar story about a painful upgrade.

Yup. At Goodreads, upgrading Ruby and Rails was quite expensive.

The more frequently you change your infrastructure, the easier it becomes to do so. We have found that most of the time, when code is updated as part of something like a compiler upgrade, it becomes less brittle and easier to upgrade in the future.

Part of the infra team's responsibility at my last company was to upgrade stuff monthly.

Shifting problem detection to the “left” earlier on this timeline makes it cheaper to fix than waiting longer, as shown in Figure 1-2.

Flatten the curve! Too soon?

all that is left is to make good decisions. This seems obvious: in software engineering, as in life, good choices lead to good outcomes.

Is it obvious, though? "Good" varies. Good can be good for quality, timeline, or feature set, and you only get to choose two. Which two are the goodest varies by situation and even who you ask.

In many organizations, whiteboard markers are treated as precious goods. They are tightly controlled and always in short supply. Invariably, half of the markers at any given whiteboard are dry and unusable.

I wish they chose a better example. Even the company where Frugality is a leadership principle and does make poor decisions that optimize for money over personnel happiness has markers freely available. Possible examples: snacks and beverages, computer peripherals (and even loaner laptops), perks that keep people happy (gym, jam rooms, maker space, ...)

Some of the quantities are subtle, or we don’t know how to measure them.

There are two in between these two, I think: * It is measurable, but we won't know what the number will be until after we actually implement and deploy it. For example, "We don't know how many ms, P90, will be saved by switching to QUIC with this client on 3G networks until we try it" * It is measurable and we know the number, but we don't know what the impact will be until after we implement and deploy it. For example, "We know we'll save 250 ms, P90, by switching to QUIC, but we don't know if that will have a meaningful impact on abandon rate."

This question can arise at many levels of the software stack because it is regularly the case that a bespoke solution customized for your narrow problem space may outperform the general utility solution that needs to handle all possibilities. By forking or reimplementing utility code and customizing it for your narrow domain, you can add new features with greater ease, or optimize with greater certainty, regardless of whether we are talking about a microservice, an in-memory cache, a compression routine, or anything else in our software ecosystem.

Touches on another favorite saying of mine: Frameworks make hard things easy, but often also easy things hard.

Revisiting Decisions, Making Mistakes One of the unsung benefits of committing to a data-driven culture is the combined ability and necessity of admitting to mistakes. A decision will be made at some point, based on the available data — hopefully based on good data and only a few assumptions, but implicitly based on currently available data. As new data comes in, contexts change, or assumptions are dispelled, it might become clear that a decision was in error or that it made sense at the time but no longer does.

I think we should be careful about calling them mistakes, which can be demoralizing. "Made sense at the time," is key. It could be "wrong" only given the new information we have now (that wasn't reasonably available then), but right given the information and trade-offs at the time. Perhaps as general guidance: It's more likely to be okay to call it a mistake when you're talking about your decision, but less likely when you're talking about someone else's. Compare: "I now know that changing the color was a mistake," vs "We now know it was a mistake for Bob to change the color."

The critical idea in this chapter is that software development is a team endeavor. And to succeed on an engineering team — or in any other creative collaboration — you need to reorganize your behaviors around the core principles of humility, respect, and trust.

At a previous company we had Leadership Principles. The way I explained why we had them was this: Many of the Leadership Principles are part of a set of ways to think about how we impact the people we work with. Are you a positive effect — magnifying, even? Is having you on the team making their code better, more productive, or even just happier? Or a negative effect; are you dragging down the people around you? And some people are neither. Maybe they're best doing mostly individual work. That's independent of how good of an engineer you are. There are great engineers who can lift a team, and make them a super productive set of people. There are great engineers who might be best to send off on their own to bang out a solution mostly independently. Then there are those brilliant engineers that nobody can stand, who sometimes even repel the other great engineers. I don't think Google's system (Difficulty, Leadership, Impact, and Citizenship) does this as well. There are elements in there, but it's not the same; it's not as targeted, or as effective in shaping behavior and evaluation.

“Can you make it possible to create open source projects that start out hidden to the world and then are revealed when they’re ready?” “Hi, I want to rewrite all my code from scratch, can you please wipe all the history?” Can you spot a common theme to these requests? The answer is insecurity. People are afraid of others seeing and judging their work in progress.

LOL, I have this tendency with new projects. Keep the early code on a branch and maybe don't even push that branch at all. Squash-rebase when I'm happy with a first version (so no one can see the the ugly intermediate states); and even that is versioned 0.1.0 so clearly I'm saying it's still nowhere near what I think it should be!

Linux is hundreds of times bigger than that initial kernel and was developed by thousands of smart people. Linus’ real achievement was to lead these people and coordinate their work; Linux is the shining result not of his original idea, but of the collective labor of the community.

¿Porque no los dos? Some people are brilliant technically ("Genius (Myth)"), and some are great leaders. Those who have made the most impact are often the rare few who are both.

The Genius Myth is the tendency that we as humans need to ascribe the success of a team to a single person/leader.

I generally disfavor saying that genius doesn’t exist, that it’s a myth, or is a bad thing. Often the most impactful people are those that are both a genius and great team members or leaders. Many of these organizations would not have been successful (or perhaps even existed) if not for these individuals being the rare genius + leader. And this isn't just principled disagreement. I think it's helpful to understand our strengths and limits (our own or of those who we have a role in developing). That better understanding can result in more appropriate development, or even just reckoning for a more appropriate path and happiness.

Hiding Considered Harmful If you spend all of your time working alone, you’re increasing the risk of unnecessary failure and cheating your potential for growth. Even though software development is deeply intellectual work that can require deep concentration and alone time, you must play that off against the value (and need!) for collaboration and review.

I get that this is generally true. But it seems overly absolute. Many of the most significant innovations do happen this way. I know we should generally/often discourage it (and why), but I think the admonishing tone is unnecessary and perhaps even a disservice to the message. Again, I don't just argue this out of principle. Sometimes people are better off going off to work alone at first. I've seen others do this successfully, and experienced it myself. Part of the reason is to avoid unnecessary doubt, invalid concerns (but understandable for the context), or even counter-proposals for other ideas . At least one personal example comes to mind. I shared an idea for a declarative query language and protocol to be used on top of RESTful APIs to eliminate round-trips. It was received with unfounded doubt (it was obvious that they didn't fully _get_ it[1]), which was fine, reasonable, and even understandable. Nonetheless, it was a slowdown. Fleshing out the idea or even a prototype would have circumvented a lot of unnecessary doubt. Other responses were obstacles, like "get approval from <x,>." I let that slow me down for more than a year. Finally, I went off to just do it on my own, on the side. With less than a week's worth of work (spread out over a month of nights and weekends), I had a fully-working prototype fitted into existing systems that cut latency in half. Meanwhile (during the year I let myself be slowed down), Facebook had announced a fundamentally identical technology, GraphQL (but incompatible with REST / our current technology). [1] For example, people concerned that it would overload the micoservices even though it wouldn't actually change how much data clients would requst (except indirectly because _people might use the app more because it was faster_).

The more feedback you solicit early on, the more you lower this risk.3

The caveat buried in the footnote, "sometimes it's _dangerous_ to get too much feedback too early in the process." is understated.

Working with other people directly increases the collective wisdom behind the effort. When you become stuck on something absurd, how much time do you waste pulling yourself out of the hole? Think about how different the experience would be if you had a couple of peers to look over your shoulder and tell you — instantly — how you goofed and how to get past the problem.

This is almost sounds like a recommendation for pair programming!

When you sit down to write a large piece of software, do you spend days writing 10,000 lines of code, and then, after writing that final, perfect line, press the “compile” button for the very first time? Of course you don’t.

Yes, when starting a new, large piece of software, I might go days before compiling the first time!

Requirements morph unexpectedly. How do you get that feedback loop so that you know the instant your plans or designs need to change? Answer: by working in a team.

I'm not sure I understand or agree with this. I think that teams more often have the tendency to treat a project as large deliverable (less iterative).

Case Study: Engineers and Offices

How is this a case study? (It's not; it's not any different than the text around it.)

If Mary was too busy, she’d just say “ack,” and you’d go on with other things until she finished with her current head state.

How about, "Ack. 5 minutes," so you know how long you're waiting and how deep/far you can go while waiting?

The point we’ve been hammering away at is that, in the realm of programming, lone craftspeople are extremely rare — and even when they do exist, they don’t perform superhuman achievements in a vacuum; their world-changing accomplishment is almost always the result of a spark of inspiration followed by a heroic team effort.

I really didn't enjoy much of the negative and fearful tone of this section. Focus more on the benefits of working with others and together; less on demonizing lone work or being a genius. Some of the most effective people I've worked with are more lone developers. We shouldn't all be that way, but it can be okay for some people to be that way and have some of those people in your org.

Consider this: how many pieces of widely used, successful software can you name that were truly written by a single person? (Some people might say “LaTeX,” but it’s hardly “widely used,” unless you consider the number of people writing scientific papers to be a statistically significant portion of all computer users!)

Why should LaTeX's success be measured against all computer users? You wouldn't measure a soccer shoe's by it's adoption among all athletes (including basketball players, golfers, and runners).

If you perform a root-cause analysis on almost any social conflict, you can ultimately trace it back to a lack of humility, respect, and/or trust. That might sound implausible at first, but give it a try. Think about some nasty or uncomfortable social situation currently in your life. At the basest level, is everyone being appropriately humble? Are people really respecting one another? Is there mutual trust?

Try to focus the arguing on the problem, not the people. Structure (meeting structure or doc structure) can help with this. Court rooms —the judicial process, really — try to do this.

it’s about creating relationships to get things done. Relationships always outlast projects. When you’ve got richer relationships with your coworkers, they’ll be more willing to go the extra mile when you need them.

Many companies are founded by people who were coworkers. That is, they trusted and liked each other to work on a project and take a risk together.

Lose the ego OK, this is sort of a simpler way of telling someone without enough humility to lose their ’tude. Nobody wants to work with someone who consistently behaves like they’re the most important person in the room. Even if you know you’re the wisest person in the discussion, don’t wave it in people’s faces.

Can one actually _lose_ the ego? I'm sure the answer is 'yes,' but it's probably rare and hard, and probably also painful. This sounds like 'hide the ego' (which may be good enough as a start).

The trick is to make sure you (and those around you) understand the difference between a constructive criticism of someone’s creative output and a flat-out assault against someone’s character. The latter is useless — it’s petty and nearly impossible to act on. The former can (and should!) be helpful and give guidance on how to improve.

Giving feedback well is a lot harder than "don't attack the person." I find that feedback that wasn't well-received was rarely an assault against someone's character. That is, even feedback on the output can be taken unwell (as in the example above).

To repeat ourselves: you are not your code. Say that over and over. You are not what you make. You need to not only believe it yourself, but get your coworkers to believe it, too.

Some people are. Some people are stubborn and don't like feedback, and it shows in their code.

Google eventually fixed the problem by explicitly defining a rubric for what we mean by “Googleyness” — a set of attributes and behaviors that we look for that represent strong leadership and exemplify “humility, respect, and trust”:

These sound awfully a lot like Amazon's Leadership Principles and SDE Functional areas. But unlike Amazon's LPs or even Google's own Three Respects, these are secondary — something you only might hear about in some contexts. Thrives in ambiguity: Deals with Ambiguity (SDE Functional) Challenges Status quo: Invent and Simplify. Think Big. Have Backbone; Disagree and Commit

6 You can find a dozen variants of this legend on the web, attributed to different famous managers.

Why not cite at least one or two?

In such an environment, each group develops its own way of doing things.1

Like creating a new chat client? 🤣

But this approach optimizes for short-term efficiency (“It’s faster for me to do it”) at the cost of poor long-term scalability (the team never learns how to do whatever it is that needs to be done). This mindset also tends to lead to all-or-nothing expertise. All-or-nothing expertise A group of people that is split between people who know “everything” and novices, with little middle ground. This problem often reinforces itself if experts always do everything themselves and don’t take the time to develop new experts through mentoring or documentation.

Kinda contradicts the policy in Ch 1 where a central team does the refactor? The central team doing the refactor are the experts; teams owning the code don't need to ramp up, creating all-or-nothing expertise.

And although one person might be able to provide personalized help for one-to-many, this doesn’t scale and is limited to small numbers of “many.” Documented knowledge, on the other hand, can better scale not just to the team but to the entire organization.

Documentation also survives time better, in a way. It can easily persist longer than the experts' presence. On the other hand, knowledge passed directly, person-to-person has the ability to adapt and stay up-to-date on the fly.

and it comes with the added maintenance cost required to keep information relevant and up to date over time.

Is this solvable? Can we make wikis easier to edit (like a Google doc) so that they're easier to keep up to date and minimize the tribal knowledge gap?

The need for psychological safety is amplified in large groups. Every member of the group has a role to play in creating and maintaining a safe environment that ensures that newcomers are confident asking questions and up-and-coming experts feel empowered to help those newcomers without the fear of having their answers attacked by established experts.

FWIW, I feel a ton of fear participating in our large groups. So — at least from my perspective — is that creating safety is more difficult to scale.

Table 3-1. Group interaction patterns Recommended patterns (cooperative) Antipatterns (adversarial)

What would be helpful here are examples.

There is no magical day when you suddenly always know exactly what to do in every situation — there’s always more to learn. Engineers who have been at Google for years still have areas in which they don’t feel like they know what they are doing, and that’s OK! Don’t be afraid to say “I don’t know what that is; could you explain it?” Embrace not knowing things as an area of opportunity rather than one to fear.3

I like to illustrate this using a backwards-looking perspective: Do you actually ever want to look back on a year and be able to say that you didn't learn anything?

The original authors are long gone, and the code is difficult to understand. It can be tempting to rewrite from scratch rather than spend time learning the existing code. But instead of thinking “I don’t get it” and ending your thoughts there, dive deeper: what questions should you be asking? Consider the principle of “Chesterson’s fence”: before removing or changing something, first understand why it’s there.

The original authors should have heeded the Beyoncé Rule. I.e., if they didn't want a fence removed, they should have had a test to protect it 💍(and documention, and a bug with details to point to 😅)! For real, though; I agree with this. I've seen too often change things without even trying to understand why they are the way the are. One example is back button behavior. A frequent problem and just a frequently angsted over because there's often no perfect or right answer. People will seek out to get the behavior they want without considering the trade-off that was previously found to be the least bad.

Group chats tend to be devoted either to topics or to teams. Topic-driven group chats are typically open so that anyone can drop in to ask a question.

Funny they should mention this since Google doesn't really have great support for topic-based chats. Chat rooms aren't discoverable (so you can only join a topic by being invited to it) and there aren't good notification controls so that you can have passive and non-spammy relationships with topics. Add to that that channel changing isn't frictionless (can't have unread channels at top of list, can't easily keyboard-shortcut open them, and channel changing is _slooow_). Adding yet more, Chat's threads are awkward, to say the least.

« Prev 1 2 Next »

See a Problem?

Preview — Software Engineering at Google by Titus Winters