If request payload exceeds certain size the response latency goes from network RTT to double that, or triple.
Definitely something wrong with either TCP or HTTP/2 windowing as it doesn't send the full request without getting ACK from server first. But none of the gRPC windowing config options nor linux tcp_wmem/rmem settings work. Sending one byte request every few hundred milliseconds fixes it by keeping the gRPC channel / TCP connection active. Nagle / slow start is disabled.
At least try something else besides gRPC when building systems so you have a baseline performance understanding. gRPC is OFTEN introducing performance breakdowns that goes unnoticed.
It's head-of-line blocking. When requests are serialized, the queue will grow as long as the time to service a request is longer than the interval between arriving requests. Queue growth is bad if sufficient capacity exists to service requests in parallel.