undefined | Better HN

0 pointskjksf8y ago0 comments

What you call a "Go thread" has a precise name (goroutine) and running in userspace is hardly the only difference between a goroutine and a kernel thread.

Creating and destroying kernel threads is significantly more expensive.

A kernel thread has a fixed stack and if you go beyond, you crash. Which means that you have to create kernel threads with worst-case-scenario stack sizes (and pray that you got it right).

Goroutine has an expandable stack and starts with very small stack (which is partly why it's faster; setting up kernel page mappings to create a contiguous space for a large stack is not free).

Finally, goroutine scheduling is different than kernel thread scheduling: a blocked goroutine consumes no CPU cycles.

In a 4 core CPU there is no point in running more than 4 busy kernel threads but kernel scheduler has to give each thread a chance to run. The more threads you have, the more time kernel spends and pointless work of ping-ponging between threads. That hurts throughput, especially when we're talking about high-load servers (serving thousands or even millions of concurrent connections).

Go runtime only creates as many threads as CPUs and avoids this waste.

That's why high-perf servers (like nginx) don't just use kernel thread per connection and go through considerable complexity of writing event driven code.

Go gives you straightforward programming model of thread-per-connection with scalability and performance much closer to event-driven model.

You work on Rust and are well informed about this topic so I'm sure you know all of that.

Which is why it amazes me the lengths to which you go to denigrate Go in that respect and minimize what is a great and unique programming model among mainstream languages.

0 comments

pcwalton8y ago

> What you call a "Go thread" has a precise name (goroutine)

I call goroutines threads because they are user-level threads.

As an analogy, NVIDIA calls local threadgroups "warps", but that doesn't make them not local threadgroups.

> Creating and destroying kernel threads is significantly more expensive.

Because kernel threads usually have larger stacks. But they don't always have large stacks: that is configurable. Other than the stack size, the primary difference is simply that kernel threads are created in kernel space and user threads are created in userspace.

> A kernel thread has a fixed stack and if you go beyond, you crash. Which means that you have to create kernel threads with worst-case-scenario stack sizes (and pray that you got it right).

You can do stack switching in 1:1 too. After all, if you couldn't, then Go couldn't do stack switching at all, since goroutines are built on top of kernel threads.

Go's small stacks are really a property of the moving GC, not a property of the threading model.

> In a 4 core CPU there is no point in running more than 4 busy kernel threads but kernel scheduler has to give each thread a chance to run.

> Go runtime only creates as many threads as CPUs and avoids this waste.

Not if they're blocked doing I/O!

If they're not blocked doing I/O, then Go tries to do preemption just as the kernel does. (I say "tries to" because Go currently cannot preempt outside function boundaries; this is a significant downside of M:N threading compared to 1:1 kernel threading.)

> That's why high-perf servers (like nginx) don't just use kernel thread per connection and go through considerable complexity of writing event driven code.

High-performance servers like nginx use an event loop because it's the only way to get the absolute fastest performance, with no overhead of stacks at all. The fact that the project described in the article gets better performance than Go's threads is proof of that fact, in fact.

It would be possible, and interesting, to do Go-like 1:1 threading with small stacks.

> Go gives you straightforward programming model of thread-per-connection with scalability and performance much closer to event-driven model.

Sure. But that's mostly because of the GC, not because of the M:N threading model.

> Which is why it amazes me the lengths to which you go to denigrate Go in that respect and minimize what is a great and unique programming model among mainstream languages.

It's not unique. As I said, NGPT used to do M:N for pthreads. Solaris used to do M:N for pthreads. The JVM used to do M:N.

dullgiulio8y ago

Nope, the JVM used to do M:1, it's very different from M:N.

j / k navigate · click thread line to collapse