More on this book
Community
Kindle Notes & Highlights
Read between
March 9 - April 3, 2023
Restoring legacy systems to operational excellence is ultimately about resuscitating an iterative development process so that the systems are being maintained and evolving as time goes on.
To understand legacy systems, you have to be able to define how the original requirements were determined. You have to excavate an entire thought process and figure out what the trade-offs look like now that the options are different.
Simply being old is not enough to make something legacy.
We are past the point where all technical conversations and knowledge sharing can be about building new things.
It is easy to build things, but it is difficult to rethink them once they are in place.
In technology, “good enough” reigns supreme.
The first mistake software engineers make with legacy modernization is assuming technical advancement is linear.
Adopting new practices doesn’t necessarily make technology better, but doing so almost always makes technology more complicated, and more complicated technology is hard to maintain and ultimately more prone to failure.
Changing technology should be about real value and trade-offs, not faulty assumptions that newer is by default more advanced.
The lesson to learn here is the systems that feel familiar to people always provide more value than the systems that have structural elegances but run contrary to expectations.
Engineers tend to overestimate the value of order and neatness. The only thing that really matters with a computer system is its effectiveness at performing its practical application.
A big red flag is raised for me when people talk about the phases of their modernization plans in terms of which technologies they are going to use rather than what value they will add.
Modernizations should be based on adding value, not chasing new technology. Familiar interfaces help speed up adoption. People gain awareness of interfaces and technology through their networks, not necessarily by popularity.
As time passes, requirements naturally change. As requirements change, usage patterns change, and the organization and design that is most efficient also changes. Use product discovery to redefine what your MVP is, and then find where that MVP is in the existing code. How are these sets of functions and features organized? How would you organize them today?
When both observability and testing are lacking on your legacy system, observability comes first. Tests tell you only what won’t fail; monitoring tells you what is failing.
The only real rule of modernizing legacy systems is that there are no silver bullets.
The key thing to remember is that this is a toolkit. You break down large problems into smaller problems, and you choose the tool that gives you the highest probability of success with that specific problem.
The most relevant guide for legacy modernizations is Michael Feathers’ Working Effectively with Legacy Code.
make sure your team can recover from failures quickly. This is an engineering best practice generally, but it’s especially important if you’re making changes to production systems.
On the surface, each legacy modernization project starts off feeling easy.
All the modernizing team should need to do is simply repeat that process using better technology, the benefit of hindsight, and improved tooling. It should be easy.
I tell my engineers that the biggest problems we have to solve are not technical problems, but people problems. Modernization projects take months, if not years of work. Keeping a team of engineers focused, inspired, and motivated from beginning to end is difficult. Keeping their senior leadership prepared to invest over and over on what is, in effect, something they already have is a huge challenge. Creating momentum and sustaining it are where most modernization projects fail.
By far, the biggest momentum killers are the assumptions that tell us the project should be easy in the first place.
When things go well, we overestimate the roles of skill and ability and underestimate the role of luck. When things go poorly, on the other hand, it’s all bad luck or external forces.
Software can have serious bugs and still be wildly successful. Lotus 1-2-3 famously mistook 1900 for a leap year, but it was so popular that versions of Excel to this day have to be programmed to honor that mistake to ensure backward compatibility. And because Excel’s popularity ultimately dwarfed that of Lotus 1-2-3, the bug is now part of the ECMA Office Open XML specification.
We struggle to modernize legacy systems because we fail to pay the proper attention and respect to the real challenge of legacy systems: the context has been lost.
Once forced to look directly at the context, we realize how innovative some of those systems really were. This gives us a little insight into which decisions were skill and foresight and which were luck.
Scale always involves some luck.
For that reason, whether a service works at its initial scale and then continues to work as it grows is always a mix of skill and luck.
Measurable problems create clearly articulated goals. Having a goal means you can define what kind of value you expect the project to add and whom that value will benefit most. Will modernization make things faster for customers? Will it improve scaling so you can sign bigger clients? Will it save people’s lives? Or, will it just mean that someone gets to give a conference talk or write an article about switching from technology A to technology B?
At first glance, this approach might seem unwise or even irresponsible, but the number-one killer of big efforts is not technical failure. It’s loss of momentum. To be successful at those long-term rearchitecting challenges, the team needs to establish a feedback loop that continuously builds on and promotes their track record of success.
With my engineers, I set the expectation that to have a productive, free-flowing debate, we need to be able to sort comments and issues into in-scope and out-of-scope quickly and easily as a team. I call this technique “true but irrelevant,” because I can typically sort meeting information into three buckets: things that are true, things that are false, and things that are true but irrelevant. Irrelevant is just a punchier way of saying out of scope.
People aren’t pessimistic and uninspired by legacy modernization projects because they don’t care or don’t realize that modernization is important. They often feel that way because they are convinced that success is impossible after experiencing a number of failures.
It’s surprisingly easy to change people’s minds about the inevitability of failure when you demonstrate that success is possible.
The hard problems around legacy modernization are not technical problems; they’re people problems. The technology is usually pretty straightforward.
Keeping people focused and motivated through the months or years it takes to finish the job is hard. To do this, you need to provide significant value right away, as soon as possible, so that you overcome people’s natural skepticism and get them to buy in. The important word in the phrase proof of concept is proof. You need to prove to people that success is possible and worth doing.
The more an organization has failed at something, the more proof it needs that modernization will bring value. When there’s a history of failure, that first step has to provide enough valu...
This highlight has been truncated due to consecutive passage length restrictions.
To provide value, estimates of opportunity cost need not be accurate. They need only provide insightful context of the trade-offs proposed by a given decision.
When a project takes months or years of sustained commitment, no shortage of things can go wrong.
A combat medic’s first job is to stop the bleeding, not order a bunch of X-rays and put together a diet and exercise plan. To be effective when you’re coming into a project that has already started, your role needs to adapt. First you need to stop the bleeding, and then you can do your analysis and long-term planning.
If a project is failing, you need to earn both the trust and respect of the team already at work to course-correct. The best way to do that is by finding a compounding problem and halting its cycle.
I had a friend who used to say her greatest honor was hearing a system she built had to be rewritten in order to scale it. This meant she had built something that people loved and found useful to the point where they needed to scale it.
Decisions motivated by wanting to avoid rewriting code later are usually bad decisions. In general, any decision made to please or impress imagined spectators with the superficial elegance of your approach is a bad one.
This requires getting consensus from engineering on what it means to be broken in the first place. I’ve mentioned SLOs/SLAs before, and I will point to them again: define what level of value a system needs to bring to the user. If an ugly piece of code meets its SLO, it might not be broken, it might be just an ugly piece of code. Technology doesn’t need to be beautiful or to impress other people to be effective, and all technologists are ultimately in the business of producing effective technology.
A modernization effort needs buy-in beyond engineering to be successful. Spending time and money on changes that will have no visible impact on the business or mission side of operations makes it hard to secure that buy-in in the future.
The only thing worse than fixing the wrong thing is leaving an attempt to fix the wrong thing unfinished.
Broadly, these techniques are part of a methodology called Code Yellow, which is a cross-functional team created to tackle an issue critical to operational excellence. The term Code Yellow refers both to the team and the process that governs the team’s activities. This was a practice developed at Google to handle issues that were beyond the scope of what any one part of the organization owned. And unlike other processes at Google, it didn’t end up documented and commented on in a thousand different management books, so the only people who seem to know what a Code Yellow is or how to run one
...more
The purpose of a Code Yellow is to create momentum. When a legacy system has performance, stability, or security issues that are both systemic and entangled with other issues, it can be overwhelming and demoralizing.
Part of the leader’s responsibility during a Code Yellow is handling communication about the Code Yellow, which includes keeping leadership briefed on progress. Although Code Yellows can be stressful, the more time the leader spends in status update meetings with senior leadership about the Code Yellow, the less time that person spends working on resolving the Code Yellow.
This book spends a lot of time discussing value and momentum because success with legacy modernization is less about technical implementation and more about the morale of the team doing the modernizing. So, what can you do if the team you’re taking over is so demoralized they won’t listen to you long enough to exercise other techniques presented in this book?

