See, for instance, https://lpc.events/event/11/contributions/901/attachments/78... slide 5 (though more has happened since then). io_uring will first see if it has everything needed to do the operation immediately, if not it'll queue a request in some cases (e.g. direct I/O, or buffered I/O in some cases). The thread pool is the last fallback, which always works if nothing else does.
https://lwn.net/Articles/821274/ talks about making async buffered reads work, for instance.
In other words, can you count on the kernel to use its own threads internally whenever an I/O task might actually need to use a lot of CPU?