Hossam’s Kindle Notes & Highlights for Building Microservices: Designing Fine-Grained Systems

Jon Eaves at RealEstate.com.au in Australia characterizes a microservice as something that could be rewritten in two weeks, a rule of thumb that makes sense for his particular context.

Microservice size

4%

Gone is the time when we could think narrowly about either our desktop website or mobile application. Now we need to think of the myriad ways that we might want to weave together capabilities for the Web, native application, mobile web, tablet app, or wearable device. As organizations move away from thinking in terms of narrow channels to more holistic concepts of customer engagement, we need architectures that can keep up.

5%

Much of the conventional wisdom around SOA doesn’t help you understand how to split something big into something small. It doesn’t talk about how big is too big. It doesn’t talk enough about real-world, practical ways to ensure that services do not become overly coupled. The number of things that go unsaid is where many of the pitfalls associated with SOA originate. The microservice approach has emerged from real-world use, taking our better understanding of systems and architecture to do SOA well. So you should instead think of microservices as a specific approach for SOA in the same way that ...more

SOA vs Microservices

7%

Erik Doernenburg first shared with me the idea that we should think of our role more as town planners than architects for the built environment. The role of the town planner should be familiar to any of you who have played SimCity before. A town planner’s role is to look at a multitude of sources of information, and then attempt to optimize the layout of a city to best suit the needs of the citizens today, taking into account future use. The way he influences how the city evolves, though, is interesting. He does not say, “build this specific building there”; instead, he zones a city. So as in ...more

Architects or Town planners?

7%

The comparison with software should be obvious. As our users use our software, we need to react and change. We cannot foresee everything that will happen, and so rather than plan for any eventuality, we should plan to allow for change by avoiding the urge to overspecify every last thing. Our city — the system — needs to be a good, happy place for everyone who uses it. One thing that people often forget is that our system doesn’t just accommodate users; it also accommodates developers and operations people who also have to work there, and who have the job of making sure it can change as ...more

Role of software architects

7%

Within each service, you may be OK with the team who owns that zone picking a different technology stack or data store. Other concerns may kick in here, of course. Your inclination to let teams pick the right tool for the job may be tempered by the fact that it becomes harder to hire people or move them between teams if you have 10 different technology stacks to support. Similarly, if each team picks a completely different data store, you may find yourself lacking enough experience to run any of them at scale. Netflix, for example, has mostly standardized on Cassandra as a data-store ...more

Choosing stack of technologies

8%

Strategic Goals The role of the architect is already daunting enough, so luckily we usually don’t have to also define strategic goals! Strategic goals should speak to where your company is going, and how it sees itself as best making its customers happy. These will be high-level goals, and may not include technology at all. They could be defined at a company level or a division level. They might be things like “Expand into Southeast Asia to unlock new markets,” or “Let the customer achieve as much as possible using self-service.” The key is that this is where your organization is headed, so ...more

Architecture strategic goals

8%

Principles Principles are rules you have made in order to align what you are doing to some larger goal, and will sometimes change. For example, if one of your strategic goals as an organization is to decrease the time to market for new features, you may define a principle that says that delivery teams have full control over the lifecycle of their software to ship whenever they are ready, independently of any other team. If another goal is that your organization is moving to aggressively grow its offering in other countries, you may decide to implement a principle that the entire system must be ...more

Architecture Principles

8%

Practices Our practices are how we ensure our principles are being carried out. They are a set of detailed, practical guidance for performing tasks. They will often be technology-specific, and should be low level enough that any developer can understand them. Practices could include coding guidelines, the fact that all log data needs to be captured centrally, or that HTTP/REST is the standard integration style. Due to their technical nature, practices will often change more often than principles. As with principles, sometimes practices reflect constraints in your organization. For example, if ...more

Architecture Practices

9%

My colleague Evan Bottcher developed the diagram shown in Figure 2-1 in the course of working with one of our clients. The figure shows the interplay of goals, principles, and practices in a very clear format. Over the course of a couple years, the practices on the far right will change fairly regularly, whereas the principles remain fairly static. A diagram such as this can be printed nicely on a single sheet of paper and shared, and each idea is simple enough for the average developer to remember. There is, of course, more detail behind each point here, but being able to articulate this in ...more

Visualization Of straregy, principles and practices

9%

Monitoring It is essential that we are able to draw up coherent, cross-service views of our system health. This has to be a system-wide view, not a service-specific view. As we’ll discuss in Chapter 8, knowing the health of an individual service is useful, but often only when you’re trying to diagnose a wider problem or understand a larger trend. To make this as easy as possible, I would suggest ensuring that all services emit health and general monitoring-related metrics in the same way. You might choose to adopt a push mechanism, where each service needs to push this data into a central ...more

MS monitoring standards

9%

Interfaces Picking a small number of defined interface technologies helps integrate new consumers. Having one standard is a good number. Two isn’t too bad, either. Having 20 different styles of integration is bad. This isn’t just about picking the technology and the protocol. If you pick HTTP/REST, for example, will you use verbs or nouns? How will you handle pagination of resources? How will you handle versioning of end points?

Interface Standards

9%

Architectural Safety We cannot afford for one badly behaved service to ruin the party for everyone. We have to ensure that our services shield themselves accordingly from unhealthy, downstream calls. The more services we have that do not properly handle the potential failure of downstream calls, the more fragile our systems will be. This means you will probably want to mandate as a minimum that each downstream service gets its own connection pool, and you may even go as far as to say that each also uses a circuit breaker. This will get covered in more depth when we discuss microservices at ...more

Safely Architecture

9%

Exemplars Written documentation is good, and useful. I clearly see the value; after all, I’ve written this book. But developers also like code, and code they can run and explore. If you have a set of standards or best practices you would like to encourage, then having exemplars that you can point people to is useful. The idea is that people can’t go far wrong just by imitating some of the better parts of your system. Ideally, these should be real-world services you have that get things right, rather than isolated services that are just implemented to be perfect examples. By ensuring your ...more

This highlight has been truncated due to consecutive passage length restrictions.

Governance techniques

11%

A model I greatly favor is having the architect chair the group, but having the bulk of the group drawn from the technologists of each delivery team — the leads of each team at a minimum. The architect is responsible for making sure the group works, but the group as a whole is responsible for governance. This shares the load, and ensures that there is a higher level of buy-in. It also ensures that information flows freely from the teams into the group, and as a result, the decision making is much more sensible and informed.

Architecture Governance

11%

Helping the people around you on their own career growth can take many forms, most of which are outside the scope of this book. There is one aspect, though, where a microservice architecture is especially relevant. With larger, monolithic systems, there are fewer opportunities for people to step up and own something. With microservices, on the other hand, we have multiple autonomous codebases that will have their own independent lifecycles. Helping people step up by having them take ownership of individual services before accepting more responsibility can be a great way to help them achieve ...more

MS Team ad ownership

11%

Summary To summarize this chapter, here are what I see as the core responsibilities of the evolutionary architect: Vision Ensure there is a clearly communicated technical vision for the system that will help your system meet the requirements of your customers and organization Empathy Understand the impact of your decisions on your customers and colleagues Collaboration Engage with as many of your peers and colleagues as possible to help define, refine, and execute the vision Adaptability Make sure that the technical vision changes as your customers or organization requires it Autonomy Find the ...more

Summary of evolutionary architect

12%

The evolutionary architect is one who understands that pulling off this feat is a constant balancing act. Forces are always pushing you one way or another, and understanding where to push back or where to go with the flow is often something that comes only with experience. But the worst reaction to all these forces that push us toward change is to become more rigid or fixed in our thinking. While much of the advice in this chapter can apply to any systems architect, microservices give us many more decisions to make. Therefore, being better able to balance all of these trade-offs is essential.

The good evolutionary architect

12%

Loose Coupling When services are loosely coupled, a change to one service should not require a change to another. The whole point of a microservice is being able to make a change to one service and deploy it, without needing to change any other part of the system. This is really quite important. What sort of things cause tight coupling? A classic mistake is to pick an integration style that tightly binds one service to another, causing changes inside the service to require a change to consumers. We’ll discuss how to avoid this in more depth in Chapter 4. A loosely coupled service knows as ...more

MS is loosely coupled

12%

High Cohesion We want related behavior to sit together, and unrelated behavior to sit elsewhere. Why? Well, if we want to change behavior, we want to be able to change it in one place, and release that change as soon as possible. If we have to change that behavior in lots of different places, we’ll have to release lots of different services (perhaps at the same time) to deliver that change. Making changes in lots of different places is slower, and deploying lots of services at once is risky — both of which we want to avoid. So we want to find boundaries within our problem domain that help ...more

MS is highly cohesive

14%

I called this onion architecture, as it had lots of layers and made me cry when we had to cut through it.

Onion architecture

19%

There are many different styles of REST, and I touch only briefly on them here. I strongly recommend you take a look at the Richardson Maturity Model, where the different styles of REST are compared.

Rest reference

21%

REST over HTTP is a sensible default choice for service-to-service interactions. If you want to know more, I recommend REST in Practice (O’Reilly), which covers the topic of REST over HTTP in depth.

Rest reference

23%

Let’s consider the example where we ask the email service to send an email when an order has been shipped. Now we could send in the request to the email service with the customer’s email address, name, and order details. However, if the email service is actually queuing up these requests, or pulling them from a queue, things could change in the meantime. It might make more sense to just send a URI for the Customer and Order resources, and let the email server go look them up when it is time to send the email.

Entity by reference

24%

Wouldn’t it be great if as a client you could look just at the version number of a service and know if you can integrate with it? Semantic versioning is a specification that allows just that. With semantic versioning, each version number is in the form MAJOR.MINOR.PATCH. When the MAJOR number increments, it means that backward incompatible changes have been made. When MINOR increments, new functionality has been added that should be backward compatible. Finally, a change to PATCH states that bug fixes have been made to existing functionality.

Semantic versioning

26%

This pattern is sometimes referred to as backends for frontends (BFFs). It allows the team focusing on any given UI to also handle its own server-side components. You can see these backends as parts of the user interface that happen to be embedded in the server. Some types of UI may need a minimal server-side footprint, while others may need a lot more. If you need an API authentication and authorization layer, this can sit between our BFFs and our UIs. We’ll explore this more in Chapter 9. The danger with this approach is the same as with any aggregating layer; it can take on logic it ...more

Backend for frontends

28%

Summary We’ve looked at a number of different options around integration, and I’ve shared my thoughts on what choices are most likely to ensure our microservices remain as decoupled as possible from their other collaborators: Avoid database integration at all costs. Understand the trade-offs between REST and RPC, but strongly consider REST as a good starting point for request/response integration. Prefer choreography over orchestration. Avoid breaking changes and the need to version by understanding Postel’s Law and using tolerant readers. Think of user interfaces as compositional layers.

Summart of integrations chapter

30%

To see these database-level constraints, which may be a stumbling block, we need to use another tool to visualize the data. A great place to start is to use a tool like the freely available SchemaSpy, which can generate graphical representations of the relationships between tables.

Schema analysis tools

30%

So how do we fix things here? Well, we need to make a change in two places. First, we need to stop the finance code from reaching into the line item table, as this table really belongs to the catalog code, and we don’t want database integration happening once catalog and finance are services in their own rights. The quickest way to address this is rather than having the code in finance reach into the line item table, we’ll expose the data via an API call in the catalog package that the finance code can call. This API call will be the forerunner of a call we will make over the wire, as we see ...more

Foreign Key breaking in MS

42%

Heroku comes to mind as being probably the gold class of PaaS. It doesn’t just handle running your service, it also supports services like databases in a very simple fashion. Self-hosted solutions do exist in this space, although they are more immature than the hosted solutions.

Heroku as Paas for MS

43%

Well, for some people, you can. However, slicing up the machine into ever increasing VMs isn’t free. Think of our physical machine as a sock drawer. If we put lots of wooden dividers into our drawer, can we store more socks or fewer? The answer is fewer: the dividers themselves take up room too! Our drawer might be easier to deal with and organize, and perhaps we could decide to put T-shirts in one of the spaces now rather than just socks, but more dividers means less overall space.

Vm as sock drawer

45%

We’ve covered a lot of ground here, so a recap is in order. First, focus on maintaining the ability to release one service independently from another, and make sure that whatever technology you select supports this. I greatly prefer having a single repository per microservice, but am firmer still that you need one CI build per microservice if you want to deploy them separately. Next, if possible, move to a single-service per host/container. Look at alternative technologies like LXC or Docker to make managing the moving parts cheaper and easier, but understand that whatever technology you ...more

Summary of MS deployment wisdom

46%

In his book Succeeding with Agile (Addison-Wesley), Mike Cohn outlines a model called the Test Pyramid to help explain what types of automated tests you need. The pyramid helps us think about the scopes the tests should cover, but also the proportions of different

Test Pyramid (ref)

51%

Another example of this is what is called blue/green deployment. With blue/green, we have two copies of our software deployed at a time, but only one version of it is receiving real requests. Let’s consider a simple example, seen in Figure 7-12. In production, we have v123 of the customer service live. We want to deploy a new version, v456. We deploy this alongside v123, but do not direct any traffic to it. Instead, we perform some testing in situ against the newly deployed version. Once the tests have worked, we direct the production load to the new v456 version of the customer service. It is ...more

Blue/green deployment

51%

With canary releasing, we are verifying our newly deployed software by directing amounts of production traffic against the system to see if it performs as expected. “Performing as expected” can cover a number of things, both functional and nonfunctional. For example, we could check that a newly deployed service is responding to requests within 500ms, or that we see the same proportional error rates from the new and the old service. But you could go deeper than that. Imagine we’ve released a new version of the recommendation service. We might run both of them side by side but see if the ...more

Canary releasing aka A/B testing

52%

Netflix uses this approach extensively. Prior to release, new service versions are deployed alongside a baseline cluster that represents the same version as production. Netflix then runs a subset of the production load over a number of hours against both the new version and the baseline, scoring both. If the canary passes, the company then proceeds to a full roll-out into production.

Canary at Netflix

52%

Nonfunctional requirements is an umbrella term used to describe those characteristics your system exhibits that cannot simply be implemented like a normal feature. They include aspects like the acceptable latency of a web page, the number of users a system should support, how accessible your user interface should be to people with disabilities, or how secure your customer data should be. The term nonfunctional never sat well with me. Some of the things that get covered by this term seem very functional in nature! One of my colleagues, Sarah Taraporewalla, coined the phrase cross-functional ...more

Cross functional requirements not NFR

53%

Summary Bringing this all together, what I have outlined here is a holistic approach to testing that hopefully gives you some general guidance on how to proceed when testing your own systems. To reiterate the basics: Optimize for fast feedback, and separate types of tests accordingly. Avoid the need for end-to-end tests wherever possible by using consumer-driven contracts. Use consumer-driven contracts to provide focus points for conversations between teams. Try to understand the trade-off between putting more effort into testing and detecting issues faster in production (optimizing for MTBF ...more

SUmmary of testing chapter

54%

Now the number of hosts we are running on is becoming a challenge. SSH-multiplexing to retrieve logs probably isn’t going to cut it now, and there isn’t a screen big enough for you to have terminals open on every host. Instead, we’re looking to use specialized subsystems to grab our logs and make them available centrally. One example of this is logstash, which can parse multiple logfile formats and can send them to downstream systems for further investigation. Kibana is an ElasticSearch-backed system for viewing logs, illustrated in Figure 8-4. You can use a query syntax to search through ...more

Log aggregate

54%

In a more complex environment, we’ll be provisioning new instances of our services pretty frequently, so we want the system we pick to make it very easy to collect metrics from new hosts. We’ll want to be able to look at a metric aggregated for the whole system — for example, the average CPU load — but we’ll also want to aggregate that metric for all the instances of a given service, or even for a single instance of that service. That means we’ll need to be able to associate metadata with the metric to allow us to infer this structure. Graphite is one such system that makes this very easy. It ...more

This highlight has been truncated due to consecutive passage length restrictions.

Metrics trqcking using Graphite

55%

I would strongly suggest having your services expose basic metrics themselves. At a bare minimum, for a web service you should probably expose metrics like response times and error rates — vital if your server isn’t fronted by a web server that is doing this for you. But you should really go further. For example, our accounts service may want to expose the number of times customers view their past orders, or your web shop might want to capture how much money has been made during the last day. Why do we care about this? Well, for a number of reasons. First, there is an old adage that 80% of ...more

This highlight has been truncated due to consecutive passage length restrictions.

Service metrics

56%

Software such as Zipkin can also trace calls across multiple system boundaries. Based on the ideas from Google’s own tracing system, Dapper, Zipkin can provide very detailed tracing of interservice calls, along with a UI to help present the data.

Interservice call tracking

57%

but a great place to start is Stephen Few’s excellent book Information Dashboard Design: Displaying Data for At-a-Glance Monitoring (Analytics Press).

Reference book for data display

57%

Riemann is an event server that allows for fairly advanced aggregation and routing of events and can form part of such a solution. Suro is Netflix’s data pipeline and operates in a similar space. Suro is explicitly used to handle both metrics associated with user behavior, and more operational data like application logs. This data can then be dispatched to a variety of systems, like Storm for real-time analysis, Hadoop for offline batch processing, or Kibana for log analysis.

Unified business and operation metrics

57%

Summary So, we’ve covered a lot here! I’ll attempt to summarize this chapter into some easy-to-follow advice. For each service: Track inbound response time at a bare minimum. Once you’ve done that, follow with error rates and then start working on application-level metrics. Track the health of all downstream responses, at a bare minimum including the response time of downstream calls, and at best tracking error rates. Libraries like Hystrix can help here. Standardize on how and where metrics are collected. Log into a standard location, in a standard format if possible. Aggregation is a pain if ...more

This highlight has been truncated due to consecutive passage length restrictions.

Summary of integration chapter

61%

There is a type of vulnerability called the confused deputy problem, which in the context of service-to-service communication refers to a situation where a malicious party can trick a deputy service into making calls to a downstream service on his behalf that he shouldn’t be able to.

Confused deputy problem

61%

For encryption at rest, unless you have a very good reason for picking something else, pick a well-known implementation of AES-128 or AES-256 for your platform.1 Both the Java and .NET runtimes include implementations of AES that are highly likely to be well tested (and well patched), but separate libraries exist for most platforms too — for example, the Bouncy Castle libraries for Java and C#. For passwords, you should consider using a technique called salted password hashing. Badly implemented encryption could be worse than having none, as the false sense of security (pardon the pun) can ...more

Encryption For data at rest

64%

There are automated tools that can probe our systems for vulnerabilities, such as by looking for cross-site scripting attacks. The Zed Attack Proxy (aka ZAP) is a good example. Informed by the work of OWASP, ZAP attempts to re-create malicious attacks on your website. Other tools exist that use static analysis to look for common coding mistakes that can open up security holes, such as Brakeman for Ruby. Where these tools can be easily integrated into normal CI builds, integrate them into your standard check-ins. Other sorts of automated tests are more involved. For example, using something ...more

Security proping tools

64%

Summary So again we return to a core theme of the book — that having a system decomposed into finer-grained services gives us many more options as to how to solve a problem. Not only can having microservices potentially reduce the impact of any given breach, but it also gives us more ability to trade off the overhead of more complex and secure approaches where data is sensitive, and a lighter-weight approach when the risks are lower. Once you understand the threat levels of different parts of your system, you should start to get a sense of when to consider security during transit, at rest, or ...more

This highlight has been truncated due to consecutive passage length restrictions.

Summary Of security chapter

65%

Probably the two poster children for the idea that organizations and architecture should be aligned are Amazon and Netflix. Early on, Amazon started to understand the benefits of teams owning the whole lifecycle of the systems they managed. It wanted teams to own and operate the systems they looked after, managing the entire lifecycle. But Amazon also knew that small teams can work faster than large teams. This led famously to its two-pizza teams, where no team should be so big that it could not be fed with two pizzas. This driver for small teams owning the whole lifecycle of their services is ...more

Team structure in Amazon and Netflix