First, we implement a per-request retry budget of up to three attempts. If a request has already failed three times, we let the failure bubble up to the caller. The rationale is that if a request has already landed on overloaded tasks three times, it’s relatively unlikely that attempting it again will help because the whole datacenter is likely overloaded.