Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Rate it:
Open Preview
0%
Flag icon
Technology is a powerful force in our society.
0%
Flag icon
Fortunately, behind the rapid changes in technology, there are enduring principles that remain true, no matter which version of a particular tool you are using.
1%
Flag icon
In computing we tend to be attracted to things that are new and shiny, but I think we have a huge amount to learn from things that have been done before.
1%
Flag icon
A data-intensive application is typically built from standard building blocks that provide commonly needed functionality.
1%
Flag icon
The things that can go wrong are called faults, and systems that anticipate faults and can cope with them are called fault-tolerant or resilient.
1%
Flag icon
A fault is usually defined as one component of the system deviating from its spec, whereas a failure is when the system as a whole stops providing the required service to the user.
1%
Flag icon
When one component dies, the redundant component can take its place while the broken component is replaced.
2%
Flag icon
There is no quick solution to the problem of systematic faults in software.
2%
Flag icon
Scalability is the term we use to describe a system’s ability to cope with increased load.
2%
Flag icon
The median is also known as the 50th percentile, and sometimes abbreviated as p50.
2%
Flag icon
For example, if the 95th percentile response time is 1.5 seconds, that means 95 out of 100 requests take less than 1.5 seconds, and 5 out of 100 requests take 1.5 seconds or more.
2%
Flag icon
Reducing response times at very high percentiles is difficult because they are easily affected by random events outside of your control, and the benefits are diminishing.
3%
Flag icon
Queueing delays often account for a large part of the response time at high percentiles.
3%
Flag icon
Even if you make the calls in parallel, the end-user request still needs to wait for the slowest of the parallel calls to complete.
3%
Flag icon
An architecture that is appropriate for one level of load is unlikely to cope with 10 times that load.
3%
Flag icon
While distributing stateless services across multiple machines is fairly straightforward, taking stateful data systems from a single node to a distributed setup can introduce a lot of additional complexity.
3%
Flag icon
An architecture that scales well for a particular application is built around assumptions of which operations will be common and which will be rare—the load parameters.
3%
Flag icon
Even though they are specific to a particular application, scalable architectures are nevertheless usually built from general-purpose building blocks, arranged in familiar patterns.
3%
Flag icon
When complexity makes maintenance hard, budgets and schedules are often overrun.
3%
Flag icon
One of the best tools we have for removing accidental complexity is abstraction.
3%
Flag icon
Reliability means making systems work correctly, even when faults occur.
3%
Flag icon
Scalability means having strategies for keeping performance good, even when load increases.
4%
Flag icon
The limits of my language mean the limits of my world.
4%
Flag icon
Most applications are built by layering one data model on top of another.
4%
Flag icon
There are many different kinds of data models, and every data model embodies assumptions about how it is going to be used.
4%
Flag icon
Specialized query operations that are not well supported by the relational model
4%
Flag icon
Different applications have different requirements, and the best choice of technology for one use case may well be different from the best choice for another use case.
5%
Flag icon
In the JSON representation, all the relevant information is in one place, and one query is sufficient.
5%
Flag icon
If the user interface has free-text fields for entering the region and the industry, it makes sense to store them as plain-text strings.
5%
Flag icon
Conference on Data Systems Languages (CODASYL)
5%
Flag icon
In a relational database, the query optimizer automatically decides which parts of the query to execute in which order, and which indexes to use.
5%
Flag icon
The main arguments in favor of the document data model are schema flexibility, better performance due to locality, and that for some applications it is closer to the data structures used by the application.
5%
Flag icon
The relational model counters by providing better support for joins, and many-to-one and many-to-many relationships.
5%
Flag icon
However, if your application does use many-to-many relationships, the document model becomes less appealing.
5%
Flag icon
No schema means that arbitrary keys and values can be added to a document, and when reading, clients have no guarantees as to what fields the documents may contain.
5%
Flag icon
A more accurate term is schema-on-read (the structure of the data is implicit, and only interpreted when the data is read), in contrast with schema-on-write (the traditional approach of relational databases, where the schema is explicit and the database ensures all written data conforms to it)
6%
Flag icon
Schema-on-read is similar to dynamic (runtime) type checking in programming languages, whereas schema-on-write is similar to static (compile-time) type checking.
6%
Flag icon
The schema-on-read approach is advantageous if the items in the collection don’t all have the same structure for some reason
6%
Flag icon
A document is usually stored as a single continuous string, encoded as JSON, XML, or a binary variant thereof (such as MongoDB’s BSON).
6%
Flag icon
If your application often needs to access the entire document (for example, to render it on a web page), there is a performance advantage to this storage locality.
6%
Flag icon
The locality advantage only applies if you need large parts of the document at the same time.
6%
Flag icon
A hybrid of the relational and document models is a good route for databases to take in the future.
6%
Flag icon
Many commonly used programming languages are imperative.
6%
Flag icon
An imperative language tells the computer to perform certain operations in a certain order.
6%
Flag icon
A declarative query language is attractive because it is typically more concise and easier to work with than an imperative API.
6%
Flag icon
The fact that SQL is more limited in functionality gives the database much more room for automatic optimizations.
6%
Flag icon
Declarative languages have a better chance of getting faster in parallel execution because they specify only the pattern of the results, not the algorithm that is used to determine the results.
7%
Flag icon
In a web browser, using declarative CSS styling is much better than manipulating styles imperatively in JavaScript.
7%
Flag icon
MapReduce is a programming model for processing large amounts of data in bulk across many machines, popularized by Google
7%
Flag icon
MapReduce is a fairly low-level programming model for distributed execution on a cluster of machines.
« Prev 1 3 15