Chaos’s Kindle Notes & Highlights for Release It!: Design and Deploy Production-Ready Software (Pragmatic Programmers)

Don’t rely on the data producers Even if you think a query will never have more than a handful of results, beware: it could change without warning because of some other part of the system. The only sensible numbers are “zero,” “one,” and “lots,” so unless your query selects exactly one row, it has the potential to return too many. Don’t rely on the data producers to create a limited amount of data. Sooner or later, they’ll go berserk and fill up a table for no reason, and then where will you be?

32%

Put limits into other application-level protocols Web service calls, RMI, DCOM, XML-RPC: all are vulnerable to returning huge collections of obj...

This highlight has been truncated due to consecutive passage length restrictions.

33%

Hope is not a design method.

33%

Now and forever, networks will always be unreliable.

34%

Apply to Integration Points, Blocked Threads, and Slow Responses The Timeouts pattern prevents calls to Integration Points from becoming Blocked Threads. Thus, they avert Cascading Failures.

34%

Apply to recover from unexpected failures When an operation is taking too long, sometimes we don’t care why…we just need to give up and keep moving. The Timeouts pattern lets us do that.

34%

Consider delayed retries Most of the explanations for a timeout involve problems in the network or the remote system that won’t be resolved right away. Immediate retries are liable to hit the same problem and result in another timeout. That just makes the user wait even longer for his error ...

This highlight has been truncated due to consecutive passage length restrictions.

35%

Don’t do it if it hurts Circuit Breaker is the fundamental pattern for protecting your system from all manner of Integration Points problems. When there’s a difficulty with Integration Points, stop calling it!

35%

Use together with Timeouts Circuit Breaker is good at avoiding calls when Integration Points has a problem. The Timeouts pattern indicates that there is a problem in Integration Points.

35%

Expose, track, and report state changes Popping a Circuit Breaker always indicates there is a serious problem. It should be visible to operations.[49] It should be...

This highlight has been truncated due to consecutive passage length restrictions.

36%

Save part of the ship The Bulkheads pattern partitions capacity to preserve partial functionality when bad things happen.

36%

Decide whether to accept less efficient use of resources When the system is not in jeopardy, partitioning the servers means each partition needs more reserve capacity. If all servers are pooled, then less total reserve capacity is needed.

36%

Pick a useful granularity You can partition thread pools inside an application, CPUs in a serv...

This highlight has been truncated due to consecutive passage length restrictions.

36%

Very important with shared services models In a service-oriented architecture, there may be many enterprise systems dependent on your application. If your application goes down because of Chain Reactions, does the entire comp...

This highlight has been truncated due to consecutive passage length restrictions.

37%

Nevertheless, someday your little database will grow up. When it hits the teenage years—about two in human years—it will get moody, sullen, and resentful. In the worst case, it will start undermining the whole system (and it will probably complain that nobody understands it, too).

38%

Avoid fiddling Human intervention leads to problems. Eliminate the need for recurring human intervention. Your system should run at least for a typical deployment cycle without manual disk cleanups or nightly restarts.

38%

Purge data with application logic DBAs can create scripts to purge data, but they don’t always know how the application behaves when data is removed. Maintaining logical integrity, especially if you use an ORM tool, requires the application to purge its own data.

39%

Limit caching In-memory caching speeds up applications, until it slows them down. Limit the amount o...

This highlight has been truncated due to consecutive passage length restrictions.

39%

Roll the logs Don’t keep an unlimited amount of log files. Configure log file rotation based on size. If you need to retain them for compl...

This highlight has been truncated due to consecutive passage length restrictions.

39%

Avoid Slow Responses and Fail Fast If your system cannot meet its SLA, inform callers quickly. Don’t make them wait for an error message, and don’t make them wait until they time out. That just makes your problem into their problem.

39%

Reserve resources, verify Integration Points early In the theme of “don’t do useless work,” make sure you will be able to complete the transaction before you start. If critical resources aren’t available—for example, a popped Circuit Breaker on a required call out—then don’t waste work by getting to that point. The odds of it changing between the beginning and the middle of the transaction are slim.

39%

Use for input validation Do basic user input validation even before you reserve resources. Don’t bother checking out a database connection, fetching domain objects, populating them, and calling validate ju...

This highlight has been truncated due to consecutive passage length restrictions.

40%

Create cooperative demand control Handshaking between client and server permits demand throttling to serviceable levels. Both client and server must be built to perform Handshaking. Most common application-level protocols—such as HTTP, JRMP, IIOP, and DCOM—do not perform Handshaking.

40%

Consider health checks Health-check requests are an application-level workaround for the lack of Handshaking in the protocols. Consider using them when the cost of the added call is much less than the cost of calling and failing.

40%

Build Handshaking into your own low-level protocols If you create your own socket-based protocol, build Handshaking into it, so the endpoints can each inform the ...

This highlight has been truncated due to consecutive passage length restrictions.

41%

Emulate out-of-spec failures Calling real applications lets you test only those errors that the real application can deliberately produce. A good Test Harness lets you simulate all sorts of messy, real-world failure modes.

41%

Stress the caller The Test Harness can produce slow responses, no responses, or garbage responses. Then you can see how your application reacts.

41%

Leverage shared harnesses for common failures You don’t necessarily need a separate Test Harness for each integration point. A “killer” server can listen to several ports, creating different failur...

This highlight has been truncated due to consecutive passage length restrictions.

41%

Supplement, don’t replace, other testing methods The Test Harness pattern augments other testing methods. It does not replace unit tests, acceptance test, FIT tests, and so on. Each of those techniques help verify functional behavior. Test Harness helps verify “nonfu...

This highlight has been truncated due to consecutive passage length restrictions.

42%

Decide at the last responsible moment Other stability patterns can be implemented without large-scale changes to the design or architecture. Decoupling Middleware is an architecture decision. It ripples into every part of the system. This is one of those nearly irreversible decisions that should be made early rather than late.

42%

Avoid many failure modes through total decoupling The more fully you decouple individual servers, layers, and applications, the fewer problems you will observe with Integration Points, Cascading Failures, Slow Responses, and Blocked Threads. You’ll find that decoupled applications are also more adaptable, since you can change any of the participants independently of the others.

42%

Learn many architectures, and choose among them Not every system needs to look like a three-tier application with an Oracle database. Learn many architectural styles, and selec...

This highlight has been truncated due to consecutive passage length restrictions.

43%

Astronomically unlikely coincidences happen daily.

43%

Sadly, the absence of a problem is not usually noted. You might be salvaging a badly botched implementation in which case you now have an opportunity to look like a hero.

44%

Unless you are building a pure two-tier client/server system where users connect directly to the database, the concurrent user is fiction.

51%

Always look for the multiplier effects. These will dominate your costs.

51%

Understand the effects that one layer has on another.

51%

Improving nonconstraint metrics will not im...

This highlight has been truncated due to consecutive passage length restrictions.

51%

Try to do the most work when nobody is w...

This highlight has been truncated due to consecutive passage length restrictions.

51%

Place safety limits on everything: timeouts, maximum memory consumption, maximum number ...

This highlight has been truncated due to consecutive passage length restrictions.

51%

Protect request-handlin...

This highlight has been truncated due to consecutive passage length restrictions.

51%

Monitor capacity continuously. Each application release can affect scalability and performance. Changes in user demand or traffic p...

This highlight has been truncated due to consecutive passage length restrictions.

52%

Eliminate contention under normal loads During “regular peak” operation, there should be no contention for resources. Regular peak load would occur on a typical day outside your company’s peak season.

52%

If possible, size resource pools to the request thread pool If there’s always a resource ready when a request-handling thread needs it, then you have no efficiency loss to overhead. For database connections, the added connections mainly consume RAM on the database server, which, while expensive, is less costly than lost revenue. Be careful, however, that a single database server can handle the maximum number of connections. During a failover situation, one node of a database cluster must serve all the queries—and all the connections.

52%

Prevent vicious cycles Resource contention causes transactions to take longer. Slower transactions cause more resource contention. If you have more than one resource pool, this cycle can cause thro...

This highlight has been truncated due to consecutive passage length restrictions.

52%

Watch for the Blocked Threads pattern The capacity problem of resource pool contention can quickly turn into a stability problem if threads block ...

This highlight has been truncated due to consecutive passage length restrictions.

52%

Don’t use code for content Loading JSP classes into memory is a kind of caching. If you have enough JSP files, you will fill the permanent generation. Even if you don’t, it’s a waste of otherwise useful memory to keep a class in memory when it might not be accessed again before you restart the application server.

53%

Avoid needless requests Don’t use polling requests for fizzy features such as autocompletion. If you must have autocompletion—for an address book, part number, department name, or whatever—send a request only when the input field actually changes. (Some of the online tutorials send the request every quarter second!)

53%

Respect your session architecture Make sure your AJAX requests include a session ID cookie or query parameter. (This is much easier with session cookies than query parameters!) If you don’t, your application server will create a new, wasted session for every AJAX request.

53%

Minimize the size of replies Return the least amount of data necessary. Reply wit...

This highlight has been truncated due to consecutive passage length restrictions.