An emerging idea is to treat GC pauses like brief planned outages of a node, and to let other nodes handle requests from clients while one node is collecting its garbage. If the runtime can warn the application that a node soon requires a GC pause, the application can stop sending new requests to that node, wait for it to finish processing outstanding requests, and then perform the GC while no requests are in progress. This trick hides GC pauses from clients and reduces the high percentiles of response time [70, 71]. Some latency-sensitive financial trading systems [72] use this approach.
I've tried this. In principle its a cool idea but really hard to put into practice. GC collections that even last hundreds of milliseconds are hard to detect because there isn't a clear "hook" you can get for a GC event. You can ping the node to death with health checks on a really tight interval (sub 10ms) but that introduces new problems.