For example, the cost of the context switch itself (storing all registers) is more significant with native threads.
Just try spinning up 100K threads that each print a line and then sleep for 10ms, and see how high your CPU usage gets.
Also, doing IO does not necessarily mean context switching - it means calling into the kernel (system calls). If you use an async IO operation (read/write from a socket) and then continue to the next thread, by the time you're done with all ready threads, you're likely to have some sockets ready to read from again, so you might not context switch at all. Kernel developers are working on even reducing the need for syscalls with io_uring, which is designed to allow you to perform IO without system calls.