High Performance Spark Quotes

Rate this book
Clear rating
High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark by Holden Karau
128 ratings, 3.98 average rating, 15 reviews
Open Preview
High Performance Spark Quotes Showing 1-3 of 3
“Co-partitioning is related to but distinct from partition co-location. We say that multiple RDDs are co-partitioned if they are partitioned by the same known partitioner.”
Holden Karau, High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
“Co-located RDDs are RDDs with the same partitioner that reside in the same physical location in memory.”
Holden Karau, High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
“Beyond being less likely to run out of memory than groupByKey, the following four functions — reduceByKey, treeAggregate, aggregateByKey, and foldByKey — are implemented to use map-side combinations, meaning that records with the same key are combined”
Holden Karau, High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark