It can be pretty annoying, because it means that systems can perform better under higher load and that you get drastically different latency depending on whether a request is scheduled on a core that just processed another request (already at high freq) or one that was idle.
And because the frequency control isn't fun enough, this behavior also exists with cpu idle states. Even at high frequency Linux can enter idle states...
I've debugged several cases where this set of issues has caused unintuitive behavior. E.g.
a) switching to a more powerful servers drastically increased latency
b) optimized code resulting in higher latency / lower throughout because that provided enough idle cycles for a deeper idle time between requests
c) slightly increased IO latency leading to significantly worse overall performance, due to the IO getting long though to clock down
Actually, thinking this through, even then it doesn't make much sense to me: if you have that many short requests coming in, the CPU would simply never scale back if it's reasonably constant. It would first need to see some gap, and why not scale the CPU back in that gap (at the cost of having the 1st request of the next batch be a few milliseconds slower)? From there on, every subsequent request is fast again until there's another lull. Keeping the CPU always on high frequency should only be needed if you have a very tight deadline on that surprise request (high-frequency trading perhaps?), or if your requests are coincidentally always spaced by the same amount of time as CPU scaling measures across. I'm sure these things exist but "intermittent workload" is 90% of all workloads and most workloads definitely aren't meaningfully impacted by cpu scaling
Yea. Most of the cases I was looking at were with postgres, with fully cached simple queries. Each taking << 10ms. The problem is more extreme if the client takes some time to actually process the result or there is network latency, but even without it's rather noticeable.
> Turning it off would, to me, only make sense if you want a server to handle thousands of fast requests per second, but those requests don't come in for periods of, say, 50 ms at a time and so the CPU scales back.
I see regressions at periods well below 50ms, but yea, that's the shape of it.
E.g. a postgres client running 1000 QPS over a single TCP connection from a different server, connected via switched 10Gbit Ethernet (ping RTT 0.030ms), has the following client side visible per-query latencies:
powersave, idle enabled: 0.392 ms
performance, idle enabled: 0.295 ms
performance, idle disabled: 0.163 ms
If I make that same 1 client go full tilt, instead of limiting it to 1000 QPS: powersave, idle enabled: 0.141 ms
performance, idle enabled: 0.107 ms
performance, idle disabled: 0.090 ms
I'd call that a significant performance change.> if you have that many short requests coming in, the CPU would simply never scale back if it's reasonably constant.
Indeed, that's what makes the whole issue so pernicious. One of the ways I saw this was when folks moved postgres to more powerful servers and got worse performance due to frequency/idle handling. The reason being that it made it more likely that cores were idle long enough to clock down.
On the same setup as above, if I instead have 800 client connections going full tilt, there's no meaningful difference between powersave/performance and idle enabled/disabled.