> I don't want tens of thousands of kernel threads blocked on results
If that is because of the memory consumption of kernel threads, be aware you are trading less memory consumption for increased cognitive complexity (if you agree that asynchronicity is more complex than synchronicity).