Kindle Notes & Highlights
by
Tom White
Read between
May 30 - July 21, 2017
One rule of thumb is to aim for reducers that each run for five minutes or so, and which produce at least one HDFS block’s worth of output.
HDFS clusters do not benefit from using RAID
To work seamlessly, SSH needs to be set up to allow passwordless login for the hdfs and yarn users from machines in the cluster.[