undefined | Better HN

0 pointssilon421mo ago0 comments

If you are spinning so long that it requires preemption, you're doing something wrong, no?

0 comments

It doesn't matter, it's a long tail thing: on average user spinlocks can work, and even appear to be beneficial on benchmarks (for many reasons, Andy alludes to some above). But if you have enough users, some of them will experience the apocalyptic long tail, no matter what you do: that's why user spinlocks are unacceptable. RSEQ is the first real answer for this, but it's still not a guarantee: it is not possible to disable SCHED_OTHER preemption in userspace.

If I make something 1% faster on average, but now a random 0.000001% of its users see a ten-second stall every day, I lose.

It is tempting to think about it as a latency/throughput tradeoff. But it isn't that simple, the unbounded thrashing can be more like a crash in terms of impact to the system.

silon42OP1mo ago

Yeah, the thrashing thing I'm very familiar on OOM scenario... the far most common Linux "crash" that I experience (at least monthly, sometimes daily, depending on what I'm doing)... I've waited overnight a few times but OOM killer still didn't activate.

mgaunard1mo ago

Well, you can always pin to a core and move other threads out of that core.

That's what you'd do if manually scheduling. Ideally the dynamic scheduler would do that on its own.

jcalvinowens1mo ago

Sure. But if you squint even that isn't good enough, you'll still take interrupts on that core in the critical section sometimes when somebody else wants the lock.

The other problem with spin-wait is that it overshoots, especially with an increasing backoff. Part of the overhead of sleeping is paid back by being woken up immediately.

When it's made to work, the backoff is often "overfit" in that very slight random differences in kernel scheduler behavior can cause huge apparent regressions.

j / k navigate · click thread line to collapse

0 comments

jcalvinowens1mo ago

If I make something 1% faster on average, but now a random 0.000001% of its users see a ten-second stall every day, I lose.

It is tempting to think about it as a latency/throughput tradeoff. But it isn't that simple, the unbounded thrashing can be more like a crash in terms of impact to the system.

silon42OP1mo ago

mgaunard1mo ago

Well, you can always pin to a core and move other threads out of that core.

That's what you'd do if manually scheduling. Ideally the dynamic scheduler would do that on its own.

jcalvinowens1mo ago

Sure. But if you squint even that isn't good enough, you'll still take interrupts on that core in the critical section sometimes when somebody else wants the lock.

The other problem with spin-wait is that it overshoots, especially with an increasing backoff. Part of the overhead of sleeping is paid back by being woken up immediately.

When it's made to work, the backoff is often "overfit" in that very slight random differences in kernel scheduler behavior can cause huge apparent regressions.

j / k navigate · click thread line to collapse