(This is assuming you are already switching to communicating using channels or similar abstraction.)
Any software that does a lot of fine-grained concurrent I/O has this issue. Database engines have been fighting this for many years, since they can pervasively block both on I/O and locking for data model concurrency control.
"Async" in native code is cargo cult, unless you're trying to run on bare metal without OS support.
There is a reason why so many developers have chosen to do application level scheduling: No operating system has exposed viable async primitives to build this on the OS level. OS threads suck so everyone reinvents the wheel. See Java's "virtual threads", Go's goroutines, Erlang's processes, NodeJS async.
You don't seem to be aware what a context switch on an application level is. It is often as simple as a function call. There is no way that returning to the OS, running a generic scheduler that is supposed to deal with any possible application workload that needs to store all the registers and possibly flush the TLB if the OS makes the mistake of executing a different process first and then restore all the registers can be faster than simply calling the next function in the same address space.
Developers of these systems brag about how you can have millions of tasks active at the same time without breaking any sweat.
Most often when doing async you have a small number of tasks repeated many times, then you spin up one thread per CPU, and "randomly" assign each task as it comes in to a thread.
When doing GUI style programming you have a lot of different tasks and each task is done in exactly one thread.