Collecting all activity related to a celebrity (e.g., replies to something they posted) in a single reducer can lead to significant skew (also known as hot spots) — that is, one reducer that must process significantly more records than the others
I've been wondering about this. Also about how you handle one reducer being assigned more data than a single node can accommodate.

