Useful knowledge and improvisation
Eric Dobbs recently retold a story on twitter (a copy is on his wiki) about one of his former New Relic colleagues, Nicholas Valler.
At the time, Nicholas was new to the company. He had just discovered a security vulnerability, and then (unrelated to that security vulnerability), an incident happened and, well, I encourage you to read the whole story first, and then come back to this post.
Wow was that ever a great suggestion, @alicegoldfuss! What a story! 0/8
— Eric Dobbs (@dobbse) September 12, 2021
In the end, the engineers were able to leverage the security vulnerability to help resolve the incident. As is my wont, I made a snarky comment.
Root cause of success: unpatched security vulnerability https://t.co/RpWVOosvCg
— Lorin Hochstein (@norootcause) September 12, 2021
But I did want to make a more serious comment about what this story illustrates. In a narrow sense, this security vulnerability helped the New Relic engineers remediate before there was severe impact. But in a broader sense, the following aspects helped them remediate:
they had useful knowledge of some aspect of the the system (port 22 was open to the world)they could leverage that knowledge to improvise a solution (they could use this security hole to log in and make changes to the kafka configuration)The irony here is that it was a new employee that had the useful knowledge. Typically, it’s the tenured engineers who have this sort of knowledge, as they’ve accumulated it with experience. In this case, the engineer discovered this knowledge right before it was needed. That’s what make this such a great story!
I do think that how Nicholas found it, by “poking around”, is a behavior that comes with general experience, even though he didn’t have much experience at the company.
"I found a security hole and was in the process of figuring out how to report to security when the the network control plane was accidentally sheared, in particular leaving all our systems inaccessible by ssh." 3/8
— Eric Dobbs (@dobbse) September 12, 2021
But being in possession of useful knowledge isn’t enough. You also need to be able to recognize when the knowledge is useful and bring it to bear.
"I mentioned it to Dana and Alice very timidly: “I think I may know a way into our systems…”. I was pretty nervous because I figured security would be peeved." 7/8
— Eric Dobbs (@dobbse) September 12, 2021
These two attributes: having useful knowledge about the system and the ability to apply that knowledge to improvise a solution, are critical for being able to deal effectively with incidents. Applying these are resilience in action.
It’s not a focus of this particular story, but, in general, this sort of knowledge is distributed across individuals. This means that it’s the ad-hoc team that forms during an incident that needs to possess these attributes.


