Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Rate it:
Open Preview
1%
Flag icon
The system should continue to work correctly (performing the correct function at the desired level of performance) even in the face of adversity (hardware or software faults, and even human error). See “Reliability”.
1%
Flag icon
It can tolerate the user making mistakes or using the software in unexpected ways.
1%
Flag icon
The system prevents any unauthorized access and abuse.
1%
Flag icon
The things that can go wrong are called faults, and systems that anticipate faults and can cope with them are called fault-tolerant or resilient.
1%
Flag icon
So it only makes sense to talk about tolerating certain types of faults.
1%
Flag icon
failure is when the system as a whole stops providing the required service to the user.
1%
Flag icon
it is usually best to design fault-tolerance mechanisms that prevent faults from causing failures.
1%
Flag icon
it can make sense to increase the rate of faults by triggering them deliberately — for example, by randomly killing individual processes without warning.
1%
Flag icon
Netflix Chaos Monkey
1%
Flag icon
Our first response is usually to add redundancy to the individual hardware components in order to reduce the failure rate of the system.
2%
Flag icon
RAID
2%
Flag icon
the platforms are designed to prioritize flexibility and elasticityi over single-machine reliability.
2%
Flag icon
Another class of fault is a systematic error within the system [8]. Such faults are harder to anticipate, and because they are correlated across nodes, they tend to cause many more system failures than uncorrelated hardware faults [5]. Examples include:
2%
Flag icon
dormant
2%
Flag icon
carefully thinking about assumptions and interactions in the system; thorough testing; process isolation; allowing processes to crash and restart; measuring, monitoring, and analyzing system behavior in production.
2%
Flag icon
Design systems in a way that minimizes opportunities for error. For example, well-designed abstractions, APIs, and admin interfaces make it easy to do “the right thing” and discourage “the wrong thing.” However, if the interfaces are too restrictive people will work around them, negating their benefit, so this is a tricky balance to get right.
2%
Flag icon
succinctly
2%
Flag icon
besides the actual time to process the request (the service time), it includes network delays and queueing delays. Latency is the duration that a request is waiting to be handled — during which it is latent, awaiting service [17].
2%
Flag icon
However, the mean is not a very good metric if you want to know your “typical” response time, because it doesn’t tell you how many users actually experienced that delay.
2%
Flag icon
Usually it is better to use percentiles.
2%
Flag icon
This is because the customers with the slowest requests are often those who have the most data on their accounts because they have made many purchases — that is, they’re the most valuable customers
3%
Flag icon
happy by ensuring the website is fast for them: Amazon has also observed that a 100 ms increase in response time reduces sales by 1%
3%
Flag icon
High percentiles become especially important in backend services that are called multiple times as part of serving a single end-user request.
3%
Flag icon
tail latency amplification
3%
Flag icon
forward decay [25], t-digest [26], or HdrHistogram [27].
3%
Flag icon
dichotomy
3%
Flag icon
While distributing stateless services across multiple machines is fairly straightforward, taking stateful data systems from a single node to a distributed setup can introduce a lot of additional complexity.
3%
Flag icon
An architecture that scales well for a particular application is built around assumptions of which operations will be common and which will be rare — the load parameters.
3%
Flag icon
Operability Make it easy for operations teams to keep the system running smoothly. Simplicity Make it easy for new engineers to understand the system, by removing as much complexity as possible from the system. (Note this is not the same as simplicity of the user interface.) Evolvability Make it easy for engineers to make changes to the system in the future, adapting it for unanticipated use cases as requirements change. Also known as extensibility, modifiability, or plasticity.
3%
Flag icon
complexity as accidental if it is not inherent in the problem that the software solves (as seen by the users) but arises only from the implementation.
4%
Flag icon
impedance mismatch.i
4%
Flag icon
Object-relational mapping (ORM) frameworks like ActiveRecord and Hibernate
4%
Flag icon
In the traditional SQL model (prior to SQL:1999), the most common normalized representation is to put positions, education, and contact information in separate tables, with a foreign key reference to the users table, as in Figure 2-1. Later versions of the SQL standard added support for structured datatypes and XML data; this allowed multi-valued data to be stored within a single row, with support for querying and indexing inside those documents. These features are supported to varying degrees by Oracle, IBM DB2, MS SQL Server, and PostgreSQL [6, 7]. A JSON datatype is also supported by ...more
5%
Flag icon
For a data structure like a résumé, which is mostly a self-contained document, a JSON representation can be quite appropriate: see Example 2-1. JSON has the appeal of being much simpler than XML. Document-oriented databases like MongoDB [9], RethinkDB [10], CouchDB [11], and Espresso [12] support this data model.
5%
Flag icon
Localization support — when the site is translated into other languages, the standardized lists can be localized, so the region and industry can be displayed in the viewer’s language
5%
Flag icon
Removing such duplication is the key idea behind normalization in databases.ii
5%
Flag icon
obscurity).
5%
Flag icon
CODASYL
5%
Flag icon
The links between records in the network model were not foreign keys, but more like pointers in a programming language (while still being stored on disk). The only way of accessing a record was to follow a path from a root record along these chains of links. This was called an access path.
5%
Flag icon
a relation (table) is simply a collection of tuples (rows), and that’s it.
5%
Flag icon
the related item is referenced by a unique identifier, which is called a foreign key in the relational model and a document reference in the document model
5%
Flag icon
schema flexibility, better performance due to locality, and that for some applications it is closer to the data structures used by the application. The relational model counters by providing
5%
Flag icon
better support for joins, and many-to-one and many-to-many relationships.
5%
Flag icon
The document model has limitations: for example, you cannot refer directly to a nested item within a document, but instead you need to say something like “the second item in the list of positions for user 251”
6%
Flag icon
Schema changes have a bad reputation of being slow and requiring downtime. This reputation is not entirely deserved: most relational database systems execute the ALTER TABLE statement in a few milliseconds.
6%
Flag icon
The schema-on-read approach is advantageous if the items in the collection don’t all have the same structure for some reason (i.e., the data is heterogeneous) — for example, because: There are many different types of objects, and it is not practicable to put each type of object in its own table. The structure of the data is determined by external systems over which you have no control and which may change at any time.
6%
Flag icon
The locality advantage only applies if you need large parts of the document at the same time.
6%
Flag icon
On updates to a document, the entire document usually needs to be rewritten
6%
Flag icon
Google’s Spanner database offers the same locality properties in a relational data model, by allowing the schema to declare that a table’s rows should be interleaved (nested) within a parent table [27]. Oracle allows the same,
6%
Flag icon
This includes functions to make local modifications to XML documents and the ability to index and query inside XML documents, which allows applications to use data models very similar to what they would do when using a document database.
« Prev 1 3 28