High Performance Spark Quotes
High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
by
Holden Karau128 ratings, 3.98 average rating, 15 reviews
Open Preview
High Performance Spark Quotes
Showing 1-3 of 3
“Co-partitioning is related to but distinct from partition co-location. We say that multiple RDDs are co-partitioned if they are partitioned by the same known partitioner.”
― High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
― High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
“Co-located RDDs are RDDs with the same partitioner that reside in the same physical location in memory.”
― High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
― High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
“Beyond being less likely to run out of memory than groupByKey, the following four functions — reduceByKey, treeAggregate, aggregateByKey, and foldByKey — are implemented to use map-side combinations, meaning that records with the same key are combined”
― High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
― High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
