More on this book
Community
Kindle Notes & Highlights
if the incident affects many teams, a dynamic “swarm” or “incident squad” of support specialists is formed from the various support teams to triage the problem and restore service as rapidly as possible.
The most effective pattern for an architecture team is as a part-time enabling team (if one is needed at all). The team being part-time is important: it emphasizes that many decisions should be taken by implementing teams rather than left to the architecture team. Some organizations compose a virtual team from members of other teams. This virtual team meets regularly to discuss and evolve the architecture aspects of the systems. This is akin to the chapter or guild terminology used by Spotify (see Chapter 4).
A crucial role of a part-time, architecture-focused enabling team is to discover effective APIs between teams and shape the team-to-team interactions with Conway’s law in mind.
The platform itself may be composed of internal stream-aligned teams, enabling teams, complicated-subsystem teams, and even lower-level platform teams, using the same team types and interactions that are used by the teams consuming the platform.
When code doesn’t work . . . the problem starts in how teams are organized and [how] people interact.
The authors also mention architectural approaches that help decouple such architectures: “Architectural approaches that enable this strategy [of supporting teams’ full ownership from design through to deployment] include the use of bounded contexts and APIs as a way to decouple large domains into smaller, more loosely coupled units.”
Without taking into account the team angle, we risk splitting the monolith in the wrong places or even creating a complex system of interdependent services. This is known as a “distributed monolith” and results in teams lacking autonomy over their services, as almost all changes require updates to other services.
many organizations have taken the time and effort to split up an application monolith into smaller services only to produce a monolithic release further down the deployment pipeline, wasting an opportunity to move faster and safer.
This monolith often results from the organization viewing the database, not the services, as the core business engine.
It’s common to find that one or more database-administration (DBA) teams were put in place to not only maintain the database but also coordinate changes to the database—a task they are often understaffed for—and they become a large bottleneck to delivery.
A monolithic build uses one gigantic continuous-integration (CI) build to get a new version of a component.
even with smaller services, it’s possible that the build scripts set out to build the entire codebase instead of using standard dependency-management mechanisms between components
When components or services can be built independently in CI but are only able to test in a shared static environment without service mocks, people end up bringing into that same environment all the latest versions of the components. They proceed to deploy the whole set of components as one, as this gives them confidence that what they tested is what will run in production.
Removing teams’ freedom to choose by enforcing a single technology stack and/or tooling strongly harms their ability to use the right tool for the job and reduces (or sometimes kills) their motivation.
their research indicates that enforcing standardization upon teams actually reduces learning and experimentation, leading to poorer solution choices.2
in two organizations that adopted open offices “the volume of face-to-face interaction decreased significantly (approximately 70%) . . . with an associated increase in electronic interaction.”
what is needed is colocation of purpose, not just colocation of bodies.
Splitting software can reduce the consistency between different parts of the software and can lead to accidental data duplication across multiple subsystems. The user experience (UX) across multiple parts of the software can be degraded if we’re not careful to achieve a coherent UX,
A fracture plane is a natural seam in the software system that allows the system to be split easily into two or more parts.
The word monolith itself comes from Greek, meaning “single stone.”
Traditional stonemasons hit stones at particular angles to split the rocks in clean segments, taking advantage o...
This highlight has been truncated due to consecutive passage length restrictions.
If that monolith is also powering multiple business domain areas, it becomes a recipe for disaster, affecting prioritization, flow of work, and user experience.
However, there are multiple other possible fracture planes for software, not only business domain areas.
A bounded context is a unit for partitioning a larger domain (or system) model into smaller parts, each of which represents an internally consistent business domain area
“a concept may appear to be atomic just because we have a single word to cover it. Look hard enough and you will find seams where you can fracture that concept.”
In summary, the business domain fracture plane aligns technology with business and reduces mismatches in terminology and “lost in translation” issues, improving the flow of changes and reducing rework.
hard borders
They often require organizations to adopt specific mechanisms for auditing, documenting, testing, and deploying software that falls within the scope of those regulations, be it credit card payments, transaction reporting, and so on.
Compliance with PCI DSS should fall on a dedicated subsystem for card data management, but these requirements should not apply to an entire monolith that happens to include payment functionality.
When the subsystem gets split off, it suddenly makes more sense to have a smaller but compliance-focused team, including business owners from compliance and/or legal areas.
Splitting off the parts of the system that typically change at different speeds allows them to change more quickly.
for a team to communicate efficiently, the options are between full colocation (all team members sharing the same physical space) or a true remote-first approach (explicitly restricting communication to agreed channels—such as messaging and collaboration apps—that everyone on the team has access to and consults regularly).
Taking more risk means accepting a higher probability of system or outcome failure in favor of getting changes into the hands of customers faster.
Other examples include marketing-driven changes with a higher risk profile (focusing on customer acquisition) versus lower risk profile changes to revenue-generating transactional features (focusing on customer retention).
Changes to popular features in the free tier might fall into a higher risk profile, as any major failure could mean losing millions of potential paying customers. Changes to paid-only features might actually sustain less risk if the speed and personalization of support for those few hundred customers makes up for occasional failures.
Flow can be considerably slower when changes involving such older technology are required, either because more manual tests must be run or difficulties are expecting implementing changes due to poor documentation and lack of an open, supportive community of users (a given for modern tech stacks). Finally, the ecosystem of tools (IDEs, build tools, testing tools, etc.) around such technology tends to behave and feel very different from modern ecosystems, increasing the cognitive load on team members that need to switch between these very different technologies.
Could we, as a team, effectively consume or provide this subsystem as a service? If the answer is yes, then the subsystem is a good candidate for splitting off and assigning to a team to own and evolve.
by taking a team approach and adopting practices like pairing (and later, mobbing) we began to see better flow of work as team members helped each other to complete tasks.
Our product teams are typically made up of four developers, one production manager, one QA, and one UX/UI designer.
Our teams speak directly to customers and stakeholders: they shadow support calls; they design, build, and measure the impact of their solutions; and they are accountable for the quality of the solutions they deliver.
We use some techniques from DDD, particularly event storming, to understand and model the doma...
This highlight has been truncated due to consecutive passage length restrictions.
At a more technical level, we use Pact for contract testing services and in...
This highlight has been truncated due to consecutive passage length restrictions.
We also have a few parts of the system that align to regulatory boundaries (particularly ISO 27001 for information security management) and to the need for cross-domain reporting of feature usage. These areas are handled by either a small specialist team or through collaboration across several teams.
The UX team acts as internal consultants across all the delivery teams, enabling them to adopt good UX practices quickly. We run an SRE capability for dealing with the high volume of traffic and enhancing operability.
Making end-to-end changes across three very different tech stacks (embedded, cloud, and mobile) requires a skill mix that is hard to find, and the associated cognitive load and context switching would be untenable.
We need to look for natural ways to break down the system (fracture planes) that allow the resulting parts to evolve as independently as possible.
“If you have microservices but you wait and do end-to-end testing of a combination of them before a release, what you have is a distributed monolith.”
What must be avoided is the need for all teams to communicate with all other teams in order to achieve their ends;
we need to define and understand three essential ways in which teams can and should interact, taking into account team-first dynamics and Conway’s law: Collaboration: working closely together with another team X-as-a-Service: consuming or providing something with minimal collaboration Facilitating: helping (or being helped by) another team to clear impediments