Status Updates From Designing Data-Intensive Ap...

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
by


Status Updates Showing 1-30 of 22,205

order by

Alisen
Alisen is on page 469 of 562
5 hours, 29 min ago Add a comment
Designing Data-Intensive Applications

Catalin
Catalin is on page 600 of 1054
22 hours, 8 min ago Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Matteo
Matteo is on page 338 of 562
Jun 15, 2026 03:13AM Add a comment
Designing Data-Intensive Applications

Catalin
Catalin is on page 582 of 1054
Jun 15, 2026 02:14AM Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Catalin
Catalin is on page 568 of 1054
The feature of a batch processing job is that it reads some input data and produces some output data, without modifying the input, in other words, the output is derived from the input. The input data is bounded: it has a fixed size. Because it is bounded, a job knows when it has finished reading the entire input, and so a job eventually completes when it is done.
Jun 15, 2026 01:19AM Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Catalin
Catalin is on page 565 of 1054
Jun 14, 2026 12:55PM Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Catalin
Catalin is on page 560 of 1054
Jun 14, 2026 08:57AM Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Kareem Hagag
Kareem Hagag is on page 180 of 614
Jun 13, 2026 05:08AM Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Alisen
Alisen is on page 439 of 562
Jun 13, 2026 01:24AM Add a comment
Designing Data-Intensive Applications

Jussi
Jussi is on page 273 of 562
Jun 12, 2026 02:30PM Add a comment
Designing Data-Intensive Applications

Serg M
Serg M is on page 291 of 640
Jun 12, 2026 11:31AM Add a comment
Высоконагруженные приложения. Программирование, масштабирование, поддержка

Catalin
Catalin is on page 536 of 1054
The CAP Theorem states that distributed data stores can simultaneously provide at most two of three guarantees: Consistency (all nodes see the same data), Availability (every request receives a response), and Partition Tolerance (the system continues to operate despite network failures)
Jun 12, 2026 03:54AM Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Catalin
Catalin is on page 536 of 1054
Jun 11, 2026 11:31PM Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Catalin
Catalin is on page 532 of 1054
This is a disadvantage because a job's output is only considered valid when the job has completed successfully. Otherwise the temp file contents are discarded -> the next job cannot continue. To manage these dependencies some workflow schedulers have been developed for Hadoop.
Jun 11, 2026 12:41PM Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Catalin
Catalin is on page 532 of 1054
MapReduce jobs are usually chained together into a workflow, such that the output of one job becomes the input to the next job, but from the frameworks pov, they are 2 separate independent jobs.

Therefore, these chained MapReduce jobs are not really like robust pipelines, and more like a sequence of commands where each command's output is written to a temp file, and the next command reads from the temp file.
Jun 11, 2026 12:40PM Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Catalin
Catalin is on page 528 of 1054
The MapReduce scheduler tries to run each map function (which can be parallelized) on one of the machines that stores a replica of the input file if the machine has enough RAM and CPU resources.

This is called "putting the computation near the data" and it saves copying the input file over the network, reducing network load and increasing locality.
Jun 11, 2026 12:28PM Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Catalin
Catalin is on page 527 of 1054
MapReduce has 4 steps.

Step 1 is to READ the set of input files and break them up into records.
Step 2 is where the MAP function is called to extract key value pairs from each record.
Step 3 is SORTING all of the key value pairs by key for further processing
Step 4 is REDUCE where the reducer function iterates over the sorted key-value pairs and produces output records like counts and more.
Jun 11, 2026 12:22PM Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Catalin
Catalin is on page 526 of 1054
HDFS basically has a process running on each machine which exposes a network service that allows other nodes from the distributed network to access files stored on that machine. So HDFS creates one big file system that can use the space on the disks of all machines running the process.

Azure Blob Storage and AWS S3 are similar to HDFS.
Jun 11, 2026 12:15PM Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

« previous 1 3 4 5 6 7 8 9 99 100