> interactions with native libraries are really slow because of that
Can you explain a bit? What is the connection between concurrency implementation (which I am assuming you are talking about multiplexing multiple coroutines over the same OS thread) and say slowness in cgo? Having to save the stack? I don't get it.
FFI is slow because the main Go compiler uses a different calling convention than everything else. I couldn't tell you if or how that's related to its concurrency features.
Its an important optimization otherwise each go routine would use a lot of memory, but its not required. The stack allocation strategy has changed a couple times in main compiler, gccgo originally support it and CGo functions behave like normal go mod the stack.