Hussain Abbas’s Kindle Notes & Highlights for Kill It with Fire: Manage Aging Computer Systems (and Future Proof Modern Ones)

We build our computer systems the way we build our cities: over time, without a plan, on top of ruins. —Ellen Ullman

Restoring legacy systems to operational excellence is ultimately about resuscitating an iterative development process so that the systems are being maintained and evolving as time goes on.

8%

The subtext behind the phrase legacy technology is that it’s also bad, barely functioning maybe, but legacy technology exists only if it is successful. These old programs are perhaps less efficient than they were before, but technology that isn’t used doesn’t survive decades.

8%

there is little downside to maintaining all systems as if they are legacy systems. It is easy to build things, but it is difficult to rethink them once they are in place.

8%

We assume the sun, moon, stars, and the board of directors will all magically reconfigure themselves around the right technical answer simply because it’s the right technical answer. We are horrified to discover that most people do not actually care how healthy a piece of technology is as long as it performs the function they need it to with a reasonable degree of accuracy in a timeframe that doesn’t exhaust their patience. In technology, “good enough” reigns supreme.

10%

The first mistake software engineers make with legacy modernization is assuming technical advancement is linear. With that frame of mind, anything built in older design patterns or with an older architectural philosophy is inferior to newer options.

10%

We advance, but we advance slowly, while moving tangentially. We abandon patterns only to reinvent them later and sell them as completely new.

10%

Nonalignable differences are characteristics that are wholly unique and innovative; there are no reference points with which to compare. You might assume that nonalignable differences are more appealing to potential consumers. After all, there’s no competition! You’re doing things differently from everyone else. But when it comes time to make a purchasing decision, if there is no comparison, there is no clear sense of value.

12%

Market shifts are complex events. We can see the pattern of technology cycling through the same approaches and structures over and over, but these shifts are less about the superiority of a given characteristic and more about how potential consumers organize themselves.

13%

On the internet, consumers pay more to get faster speeds. That put the pressure on telecommunication companies to compete by making connections faster. The faster the internet became, the more people put on it. The more content that was on the internet, the more consumers started logging on. The more people trying to access a given resource on the internet, the more expensive hosting those resources on your own machines became. Eventually, this flipped the value proposition of the computer industry by making it cheaper to process data “in the cloud” than it was to process it locally.

13%

One can track how architectural paradigms fall in and out of favor roughly by whether processing power and storage capacity are growing faster than network speeds; however, faster processors are often a component of what telecoms use to boost their network speeds. This kind of interdependency is true for basically any market. Product development shifts consumer behavior, which shifts product development. Technology doesn’t advance in a straight line, because a straight line is not actually efficient. The economy is not a flat plane, but a rich topography with ridges and valleys to navigate ...more

14%

Adopting new practices doesn’t necessarily make technology better, but doing so almost always makes technology more complicated, and more complicated technology is hard to maintain and ultimately more prone to failure.

14%

It’s important to understand that we advance in cycles, because that’s the only way we learn how to avoid unnecessary rewrites and partial migrations. Changing technology should be about real value and trade-offs, not faulty assumptions that newer is by default more advanced.

14%

Whether Big Data as a Service saves you any money depends on how big your big data actually is, where it is centralized, and how long it takes it to get that big in the first place. Having petabytes of data collected over a five-year period is a different situation from having petabytes generated over the course of a few hours. Value propositions are often complicated questions for this reason. It’s hard enough for a purely technical organization to get it right; it’s even harder at organizations where the only people with enough knowledge to advise on these issues are vendors.

15%

We often think of technology as being streamlined and efficient with no unnecessary bits without a clear purpose, but in fact, many forms of technology you depend on have vestigial features either inherited from other older forms of technology or imported later to create the illusion of feature parity.

19%

If Dennis and Ken had a Selectric instead of a Teletype, we’d probably be typing “copy” and “remove” instead of “cp” and “rm.” Proof again that technology limits our choices as often as it expands them.

19%

Technical correctness didn’t matter. Elegance didn’t matter. Execution mattered, and anything that lowered the barrier to using computers to execute their goals was preferable to more powerful tools that were harder to learn.

20%

The design of the language is never what’s important; it’s the people. The type of people who would have become COBOL programmers before are now becoming Java programmers, making Java the natural choice, despite that it was not designed to handle the use case for which COBOL was optimized.

20%

Overall, interfaces and ideas spread through networks of people, not based on merit or success. Exposure to a given configuration creates the perception that it’s easier and more intuitive, causing it to be passed down to more generations of technology.

21%

Engineers tend to overestimate the value of order and neatness. The only thing that really matters with a computer system is its effectiveness at performing its practical application.

21%

The incentives of individual praise aside, engineering teams tend to gravitate toward full rewrites because they incorrectly think of old systems as specs. They assume that since an old system works, all technical challenges and possible problems have been settled. The risks have been eliminated! They can add more features to the new system or make changes to the underlying architecture without worry. Either they do not perceive the ambiguity these changes introduce or they see such ambiguity positively, imagining only gains in performance and the potential for greater innovation.

22%

Very few rewrite plans take the form of redesigning the system using the same language or merely fixing a well-defined structural issue. The goal of full rewrites is to restore ambiguity and, therefore, enthusiasm. They fail because the assumption that the old system can be used as a spec and be trusted to have diagnosed accurately and eliminated every risk is wrong.

22%

Artificial consistency can bring value to nontechnical processes. For example, standardizing on one programming language makes recruiting, hiring, and, ultimately, sharing engineering resources much easier. But when the principal purpose of a modernization effort is to provide technical value, be careful not to be seduced by the assumption that things that look the same, or that we use the same words to describe, actually integrate better.

Artificial consistency focuses on similarity of form and classification over functionality. Two things shouldn't be forced together just because they look similar.

23%

Modernizations should be based on adding value, not chasing new technology. Familiar interfaces help speed up adoption. People gain awareness of interfaces and technology through their networks, not necessarily by popularity.

24%

The terms legacy and technical debt are frequently conflated. They are different concepts, although a system can show signs of both problems.

26%

In 1983, Charles Perrow coined the term normal accidents to describe systems that were so prone to failure, no amount of safety procedures could eliminate accidents entirely. According to Perrow, normal accidents are not the product of bad technology or incompetent staff. Systems that experience normal accidents display two important characteristics.

They are tightly coupled. They are complex.

27%

If your goal is to reduce failures or minimize security risks, your best bet is to start by evaluating your system on those two characteristics: Where are things tightly coupled, and where are things complex? Your goal should not be to eliminate all complexity and all coupling; there will be trade-offs in each specific instance.

27%

It’s not about transforming your legacy system into something that is completely simple and uncoupled, it’s about being strategic as to where you are coupled and where you are complex and to what degree. Places of complexity are areas where the human operators make the most mistakes and have the greatest probability of misunderstanding. Places of tight coupling are areas of acceleration where effects both good and bad will move faster, which means less time for intervention.

28%

When you first take on a legacy system, you can’t possibly understand it well enough to make big changes right away. As part of those pragmatic changes, we also invested a lot of time documenting and researching the system.

29%

When both observability and testing are lacking on your legacy system, observability comes first. Tests tell you only what won’t fail; monitoring tells you what is failing.

30%

Finally, make sure your team can recover from failures quickly. This is an engineering best practice generally, but it’s especially important if you’re making changes to production systems. If you’ve never restored from a backup, you don’t actually have backups. If you’ve never failed over to another region, you don’t actually have failovers. If you’ve never rolled back a deploy, you don’t have a mature deploy pipeline.

31%

We struggle to modernize legacy systems because we fail to pay the proper attention and respect to the real challenge of legacy systems: the context has been lost. We have forgotten the web of compromises that created the final design and are blind to the years of modifications that increased its complexity.

32%

Overgrowth is a particular type of coupling between the software and the layers of abstraction making up the platform on which it runs. The perils of dependency management are well known, but with legacy systems, dependency management is about more than just what a package manager might install. The older a system is, the more likely the platform on which it runs is itself a dependency.

33%

What makes major migrations so tricky is that as software ages, elements of the platform on which it was defined to run fall out of fashion, and support for those elements on other platforms becomes less and less common. This means that on our oldest systems, there is typically logic that either must be written out of the system or must be reproduced on a modern platform. The existing platform becomes auxiliary software that grows around whatever is being migrated.

36%

Few modern software engineers would forgo Agile development to spend months planning exactly what an architecture should look like and try to build a complete product all at once. And yet, when asked to modernize an old system, suddenly everyone is breaking things down into sequential phases that are completely dependent on one another.

36%

Assuming you fully understand the requirements because an existing system is operational is a critical mistake. One of the advantages of building a new system is that the team is more aware of the unknowns. Existing systems can be a distraction.

37%

When people stop believing success is possible, they stop bringing their best to work. Measurable problems empower team members to make decisions. Everyone has agreed that metric X needs to be better; any actions taken to improve metric X need not be run up the chain of command.

37%

Measurable problems create clearly articulated goals. Having a goal means you can define what kind of value you expect the project to add and whom that value will benefit most. Will modernization make things faster for customers? Will it improve scaling so you can sign bigger clients? Will it save people’s lives? Or, will it just mean that someone gets to give a conference talk or write an article about switching from technology A to technology B?

37%

Good modernization work needs to suppress that impulse to create elegant comprehensive architectures up front. You can have your neat and orderly system, but you won’t get it from designing it that way in the beginning. Instead, you’ll build it through iteration.

37%

the number-one killer of big efforts is not technical failure. It’s loss of momentum. To be successful at those long-term rearchitecting challenges, the team needs to establish a feedback loop that continuously builds on and promotes their track record of success.

38%

Facilitating technical conversations is more important than being the decision-maker because unproductive and frustrating meetings demoralize teams.

38%

A great meeting is not a meeting where no one ever mentions anything out of scope; it’s one where out-of-scope comments are quickly identified as such by the team and dispatched before they have derailed the conversation.

39%

Remember, technology has a number of trade-offs where optimizing for one characteristic diminishes another important characteristic. Examples include security versus usability, coupling versus complexity, fault tolerance versus consistency, and so on, and so forth. If two engineers really can’t agree on a decision, it’s usually because they have different beliefs about where the ideal optimization between two such poles is.

39%

Becoming good at experiments is valuable for practically any organization. It’s the basis of iteration—you build something, collect data on how it is performing, modify it to improve performance, and start the cycle over.

40%

The hard problems around legacy modernization are not technical problems; they’re people problems. The technology is usually pretty straightforward. Keeping people focused and motivated through the months or years it takes to finish the job is hard. To do this, you need to provide significant value right away, as soon as possible, so that you overcome people’s natural skepticism and get them to buy in.

40%

The important word in the phrase proof of concept is proof. You need to prove to people that success is possible and worth doing.

41%

You may think that by giving projects fancy names, projecting budgets, and settling staffing questions up front you are being diligent, and you are! But you’re also making the project look like a series of big decisions, which for audiences insulated from the day-to-day pain of legacy systems seems too risky. Consider different ways of talking about the same project for different audiences.

42%

It can be tricky getting started with opportunity costs because the number of potential opportunities to calculate can seem infinite. Remember that opportunity costs are thought experiments and rhetorical devices. You don’t need to list the costs of everything your team might be doing, just the activities that strengthen the case for what you want to be doing.

43%

SLOs and SLAs help the team prioritize fixes by how much pain the problem is causing for users. They are a good thing to have even if you feel confident that you won’t need to justify what you modernize and when. But if you do have to justify your strategy, you should be able to study historical data and project under what conditions a given system or part of a system might violate its SLO.

43%

There are parts of the system with shared ownership, parts that no one is responsible for at all, parts where responsibilities are split in unintuitive ways. When looking for bad technology, debt, or security issues, the most productive places to mine are gaps between what two components of the same organization officially own.

See a Problem?

Preview — Kill It with Fire by Marianne Bellotti