undefined | Better HN

0 pointsjavert13y ago0 comments

If you want to program using a ton of cheap threads (like in Go or Erlang or something), you pretty much have to do it in the runtime (i.e., use user-level threads), because you will waste a ton of time in the kernel on initialization and context switching, and you will waste a lot of memory.

In general, if you're using any other programming paradigm (i.e., 90+% of all software), you don't need tons of cheap threads; your application probably only needs at most one per core. In that case, you aren't constantly context switching (except to legitimately mutlitask with other programs), and you aren't using up that much kernel memory to hold thread state (because you only have a few threads, not a ton). So, you should just use the kernel as it was intended.

My assumption in reading the article is that the author was very much talking about the 90+% case (he didn't really say what he was talking about in the article).

There actually is another case though, which I think is what the author is really getting at. What if you have a small number of threads that need to be extremely high performance, and they have extremely short critical sections (which is not necessarily the common case across most applications)? Then, you would not want your threads to constantly suspend in the kernel every time there is a tiny bit of contention. You'd rather have them spin for a few cycles and just wait until they can actually get the lock... or just use hardware features (like the compare and exchange instructions) to abstract away the issue.

To do either of those, yeah, you pretty much need to code it yourself or (better) get a third party library. AFAIK. You would think pthreads would have a spinlock option that can never suspend (unlike futex, which can suspend), but if it does, I'm not aware of it.

In fact, formally speaking, there are lock-free algorithms, which don't have a lock, but can have unbounded retried before they get to access the data. And then you have wait-free algorithms, which can guarantee you get access to the data within a bounded number of retries.

0 comments

smosher13y ago

Yeah, I get the different thread usage scenarios, but I'm still wondering about measurements. I'm less concerned about getting enough performance than I am about wasted performance, which is why I am thinking about measurements. If it's 10% that's something, if it's 0.0000000001% then who cares?

I also wonder about that 90+% case. Is it really 1:1 threads:cores at most? In my mind and experience, the architecture of the program (and of course the nature of the problem) should have a greater impact on the number of threads than leaving it at "are we using Go/Rust/Erlang vs. C/C++/Java." (Most recently I had several hundred coroutines on a single thread because the problem demanded it. In my world, the contended resource is never the CPU.) I'm wondering whether how much of the 90+% is really 1:1 and of that, how much really ought to be. Few problems are "as concurrent as the number of processors in the system" in the abstract. This is more of a question of nice design than anything else, but that's worth something (and the same reason we shouldn't needlessly include a scheduler in our programs.)

Another point is that most of these high-concurrency languages aim to satisfy that same 90+% in addition. In doing so, I wonder how much trouble they save (if any, given the runtime overhead, though within the runtime the scheduler itself had better be better than defaulting to the OS scheduler.)

But yes, I absolutely agree that you shouldn't be writing a scheduler in your application just to eek out a little bit of CPU overhead.

j / k navigate · click thread line to collapse

0 comments

smosher13y ago

But yes, I absolutely agree that you shouldn't be writing a scheduler in your application just to eek out a little bit of CPU overhead.

j / k navigate · click thread line to collapse