Status Updates From Designing Data-Intensive Ap...

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
by


Status Updates Showing 1-30 of 22,185

order by

Serg M
Serg M is on page 291 of 640
50 minutes ago Add a comment
Высоконагруженные приложения. Программирование, масштабирование, поддержка

Catalin
Catalin is on page 536 of 1054
The CAP Theorem states that distributed data stores can simultaneously provide at most two of three guarantees: Consistency (all nodes see the same data), Availability (every request receives a response), and Partition Tolerance (the system continues to operate despite network failures)
8 hours, 26 min ago Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Catalin
Catalin is on page 536 of 1054
12 hours, 50 min ago Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Catalin
Catalin is on page 532 of 1054
This is a disadvantage because a job's output is only considered valid when the job has completed successfully. Otherwise the temp file contents are discarded -> the next job cannot continue. To manage these dependencies some workflow schedulers have been developed for Hadoop.
23 hours, 40 min ago Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Catalin
Catalin is on page 532 of 1054
MapReduce jobs are usually chained together into a workflow, such that the output of one job becomes the input to the next job, but from the frameworks pov, they are 2 separate independent jobs.

Therefore, these chained MapReduce jobs are not really like robust pipelines, and more like a sequence of commands where each command's output is written to a temp file, and the next command reads from the temp file.
23 hours, 41 min ago Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Catalin
Catalin is on page 528 of 1054
The MapReduce scheduler tries to run each map function (which can be parallelized) on one of the machines that stores a replica of the input file if the machine has enough RAM and CPU resources.

This is called "putting the computation near the data" and it saves copying the input file over the network, reducing network load and increasing locality.
23 hours, 52 min ago Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Catalin
Catalin is on page 527 of 1054
MapReduce has 4 steps.

Step 1 is to READ the set of input files and break them up into records.
Step 2 is where the MAP function is called to extract key value pairs from each record.
Step 3 is SORTING all of the key value pairs by key for further processing
Step 4 is REDUCE where the reducer function iterates over the sorted key-value pairs and produces output records like counts and more.
23 hours, 59 min ago Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Catalin
Catalin is on page 526 of 1054
HDFS basically has a process running on each machine which exposes a network service that allows other nodes from the distributed network to access files stored on that machine. So HDFS creates one big file system that can use the space on the disks of all machines running the process.

Azure Blob Storage and AWS S3 are similar to HDFS.
Jun 11, 2026 12:15PM Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Catalin
Catalin is on page 525 of 1054
MapReduce is a programming model used to process massive volumes of data across distributed computer clusters. One such example is the Hadoop Distributed File System (HDFS).
Jun 11, 2026 12:14PM Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Kareem Hagag
Kareem Hagag is on page 174 of 614
Jun 11, 2026 10:33AM Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Catalin
Catalin is on page 510 of 1054
Jun 11, 2026 06:59AM Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Leo Devick
Leo Devick is on page 73 of 562
Jun 10, 2026 01:40PM Add a comment
Designing Data-Intensive Applications

Catalin
Catalin is on page 461 of 1054
Jun 10, 2026 12:31PM Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Kasia Marysia
Kasia Marysia is on page 23 of 562
Save me lord from bad books databases heal my soul pls never a fucking gain reading stupid shit i swear no more self harm
Jun 10, 2026 05:53AM Add a comment
Designing Data-Intensive Applications

Reader
Reader is on page 28 of 562
Jun 09, 2026 03:41PM Add a comment
Designing Data-Intensive Applications

Catalin
Catalin is on page 446 of 1054
Jun 09, 2026 11:26AM Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Catalin
Catalin is on page 418 of 1054
A distributed system is Byzantine fault-tolerant if its nodes can discern when a node/s is acting maliciously and accurately block that nodes' actions.
Jun 09, 2026 05:48AM Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Catalin
Catalin is on page 416 of 1054
These clocks times are usually reported within a certain confidence interval by their manufacturers. For instance, when you get the time reported by a machine, companies like Google also offer APIs that report the CI, so you get a two value pair, saying the (earliest, latest) values of the time such that it makes sure that the actual time reported for the operation is within that confidence interval.
Jun 09, 2026 05:46AM Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Catalin
Catalin is on page 416 of 1054
Because the time of the clocks of two different machines might be out of sync with one another due to small variations, in a scenario in which different reads/writes are made by these machines it may happen that though the order of operations was a certain way because of the time delays on the internal representation of time on each machine, the operations get reversed or ordered differently to the end user.
Jun 09, 2026 05:45AM Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Catalin
Catalin is on page 415 of 1054
On the other hand, the MONOTONIC CLOCKS just count the elapsed time from a specific event. For instance, in a VM it can be that the monotonic clock counts the time from when the machine was started and as a consequence just count the elapsed time since boot. Thus, comparing the MONOTONIC time of two different VM’s does not mean anything.
Jun 09, 2026 05:45AM Add a comment
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

« previous 1 3 4 5 6 7 8 9 99 100