Thanks for sharing. I wasn’t aware that was a thing, it seems to be a form of manipulating the appearance of performance rather than actually boosting it. We log all slow calls so we can find out what they’re running up against - knowing a call took more than 5ms a p99 of 5ms is a pretty poor internal signal, but being able to trace which calls took 15 or 75s (vs those that took less but would also have been killed nevertheless) is extremely helpful.
Perhaps probabilistically terminating calls would work better? I assume the decision has to be made ahead of time with timeout contexts if there anything like cancellation tokens, so even if you give just 5% of all your inbound requests a deadline 10000x as long, you’ll still get some useful info to work with.
As a user, I would absolutely hate it. I somehow frequently run into pockets of badly written or architectured code that cause some of my requests to take a minute or more to be fulfilled on an otherwise responsive server - if I had to retry “just” twenty times for it to go through, I’d lose my mind.