Holden Karau
Goodreads Author
Member Since
March 2014
More books by Holden Karau…
“Co-partitioning is related to but distinct from partition co-location. We say that multiple RDDs are co-partitioned if they are partitioned by the same known partitioner.”
― High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
― High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
“Note that the hash function you pass will be compared by identity to that of other RDDs. If you want to partition multiple RDDs with the same partitioner, pass the same function object (e.g., a global function) instead of creating a new lambda for each one!”
― Learning Spark: Lightning-Fast Big Data Analysis
― Learning Spark: Lightning-Fast Big Data Analysis
“For each input source, Spark Streaming launches receivers, which are tasks running within the application’s executors that collect data from the input source and save it as RDDs. These receive the input data and replicate it (by default) to another executor for fault tolerance. This data is stored in the memory of the executors in the same way as cached RDDs.1”
― Learning Spark: Lightning-Fast Big Data Analysis
― Learning Spark: Lightning-Fast Big Data Analysis
“decided early on that my ideal dog would be like Diefenbaker in the TV show Due South, which was popular in the UK where I was living.”
― Wag: The Science of Making Your Dog Happy
― Wag: The Science of Making Your Dog Happy





























