undefined | Better HN

0 pointsorisho6y ago0 comments

There are other costs to regular context switching as opposed to goroutines/greenlets (the green threads that gevent uses). I don't remember the details but specific attention was paid to the point of making context switching and other resource consumption by these green threads cheaper than native threads, so I suggest reading about it in greenlet/Golang docs :) You can also try searching for C10K which was the term people used to discuss how to achieve 10K connections, and is often associated with cooperative threading.

For example, the cost of the context switch itself (storing all registers) is more significant with native threads.

Just try spinning up 100K threads that each print a line and then sleep for 10ms, and see how high your CPU usage gets.

Also, doing IO does not necessarily mean context switching - it means calling into the kernel (system calls). If you use an async IO operation (read/write from a socket) and then continue to the next thread, by the time you're done with all ready threads, you're likely to have some sockets ready to read from again, so you might not context switch at all. Kernel developers are working on even reducing the need for syscalls with io_uring, which is designed to allow you to perform IO without system calls.

0 comments

jashmatthews6y ago

Green threads are much cheaper to switch than pthreads, yes. In real applications the difference is far smaller than it was 20 years ago when C10k was challenging. In 2020 you can just open 10k threads and forget about it.

With 100k threads and 100k Goroutines, each doing nothing but waiting on a mutex: pthreads in C takes ~20 microseconds per thread and in Go it’s about ~5 microseconds using Goroutines.

This difference disappears really easily. Parse some JSON and it’ll be gone.

Entering kernel code is the expensive part of context switching so syscalls are very nearly as expensive. Reading from a socket still needs a syscall, even with green threads or asynchronous IO.

The more different bits of IO you do, like in a real web app, the less advantage their is to green threads. This is one reason Rust dropped their M:N threading implementation.

fulafel6y ago

There shouldn't be any 100k hard limit for threads at least in Linux, though you need enough memory for 100k stacks of course. You need to increase some default limits for it though (https://stackoverflow.com/a/26190804)

Assuming a generous(?) 20 kB per thread in stack and other corresponding OS bookkeeping inforation you could have 1k threads in 20 MB, or 1M threads in 20 GB.

Doing 100 Hz timer wakeups and IOs concurrently in 100k threads makes 10 M wakeups/second, that takes a chunk of CPU independent of green / native threads choice. Performance vs kernel threads will depend on the green threads implementation.

jashmatthews6y ago

Yup. The Linux scheduler wakes threads based on IO events. You don’t end up just cycling through 100k threads all waking and sleeping again.

fulafel6y ago

It's worth noting that the c10k writeup came out 20+ years ago, and those bottlenecks have been addressed both by fixing software bottlenecks and 20 years of semiconductor improvements.

j / k navigate · click thread line to collapse

0 comments

jashmatthews6y ago

With 100k threads and 100k Goroutines, each doing nothing but waiting on a mutex: pthreads in C takes ~20 microseconds per thread and in Go it’s about ~5 microseconds using Goroutines.

This difference disappears really easily. Parse some JSON and it’ll be gone.

Entering kernel code is the expensive part of context switching so syscalls are very nearly as expensive. Reading from a socket still needs a syscall, even with green threads or asynchronous IO.

The more different bits of IO you do, like in a real web app, the less advantage their is to green threads. This is one reason Rust dropped their M:N threading implementation.

fulafel6y ago

Assuming a generous(?) 20 kB per thread in stack and other corresponding OS bookkeeping inforation you could have 1k threads in 20 MB, or 1M threads in 20 GB.

jashmatthews6y ago

Yup. The Linux scheduler wakes threads based on IO events. You don’t end up just cycling through 100k threads all waking and sleeping again.

fulafel6y ago

It's worth noting that the c10k writeup came out 20+ years ago, and those bottlenecks have been addressed both by fixing software bottlenecks and 20 years of semiconductor improvements.

j / k navigate · click thread line to collapse