Fibers under the magnifying glass [1] might be a relevant paper here. Its conclusion, after surveying many different implementations, is that lightweight threads are slower than stack less coroutines.
[1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p136...