Creating and destroying kernel threads is significantly more expensive.
A kernel thread has a fixed stack and if you go beyond, you crash. Which means that you have to create kernel threads with worst-case-scenario stack sizes (and pray that you got it right).
Goroutine has an expandable stack and starts with very small stack (which is partly why it's faster; setting up kernel page mappings to create a contiguous space for a large stack is not free).
Finally, goroutine scheduling is different than kernel thread scheduling: a blocked goroutine consumes no CPU cycles.
In a 4 core CPU there is no point in running more than 4 busy kernel threads but kernel scheduler has to give each thread a chance to run. The more threads you have, the more time kernel spends and pointless work of ping-ponging between threads. That hurts throughput, especially when we're talking about high-load servers (serving thousands or even millions of concurrent connections).
Go runtime only creates as many threads as CPUs and avoids this waste.
That's why high-perf servers (like nginx) don't just use kernel thread per connection and go through considerable complexity of writing event driven code.
Go gives you straightforward programming model of thread-per-connection with scalability and performance much closer to event-driven model.
You work on Rust and are well informed about this topic so I'm sure you know all of that.
Which is why it amazes me the lengths to which you go to denigrate Go in that respect and minimize what is a great and unique programming model among mainstream languages.