This is an interesting approach, but a simpler one is to add a level of indirection by having a supervisor meter requests, including retries, and provide back-pressure as applicable.
Doing so allows other decisions to be made beyond rate limiting, such as prioritization, deferring invocations, memoization, etc.