undefined | Better HN

0 pointsanarazel1y ago0 comments

> Intermittent workloads on the order of 2 milliseconds, you mean?

Yea. Most of the cases I was looking at were with postgres, with fully cached simple queries. Each taking << 10ms. The problem is more extreme if the client takes some time to actually process the result or there is network latency, but even without it's rather noticeable.

> Turning it off would, to me, only make sense if you want a server to handle thousands of fast requests per second, but those requests don't come in for periods of, say, 50 ms at a time and so the CPU scales back.

I see regressions at periods well below 50ms, but yea, that's the shape of it.

E.g. a postgres client running 1000 QPS over a single TCP connection from a different server, connected via switched 10Gbit Ethernet (ping RTT 0.030ms), has the following client side visible per-query latencies:

  powersave, idle enabled: 0.392 ms
  performance, idle enabled: 0.295 ms
  performance, idle disabled: 0.163 ms

If I make that same 1 client go full tilt, instead of limiting it to 1000 QPS:

  powersave, idle enabled: 0.141 ms
  performance, idle enabled: 0.107 ms
  performance, idle disabled: 0.090 ms

I'd call that a significant performance change.

> if you have that many short requests coming in, the CPU would simply never scale back if it's reasonably constant.

Indeed, that's what makes the whole issue so pernicious. One of the ways I saw this was when folks moved postgres to more powerful servers and got worse performance due to frequency/idle handling. The reason being that it made it more likely that cores were idle long enough to clock down.

On the same setup as above, if I instead have 800 client connections going full tilt, there's no meaningful difference between powersave/performance and idle enabled/disabled.

0 comments

lucb1e1y ago

Huh, that is interesting, thanks for clarifying and going so far as sharing benchmark examples! I will try this on my own hardware as well and edit in the results (though I'm far from running into performance limitations on the old laptop that I use for hosting various projects, it could still be something to tune when I run some big task with lots of queries)

Maybe for completeness, what CPU type is this on?

anarazelOP1y ago

> I will try this on my own hardware as well

FWIW, my results corresponded to:

  cpupower frequency-set --governor powersave && cpupower idle-set -E

  cpupower frequency-set --governor performance && cpupower idle-set -E

  cpupower frequency-set --governor performance && cpupower idle-set -D0

It's perhaps worth pointing out that -D0 sometimes hurts performance, by reducing the boost potential of individual cores, due to the higher baseline temp & power usage.

> Maybe for completeness, what CPU type is this on?

This was a 2x Xeon Gold 5215. But I've reproduced this on newer Intel and AMD server CPUs too.

> (though I'm far from running into performance limitations on the old laptop that I use for hosting various projects, it could still be something to tune when I run some big task with lots of queries)

If you're run larger queries or queries at a higher frequency (i.e. client on the same host instead of via network, or the client uses pipelining), the problem doesn't typically manifest to a significant degree.

lucb1e1y ago

Thanks, also for providing the ready-to-use commands!

I did four tests on my "server" with an Intel i7 3630QM CPU. Pseudocode:

    - Test 1, simple benchmark: `php -r '$starttime=microtime(1); while($starttime+1>microtime(1)){$loops++} print($loops);`, running in parallel for each real CPU core (not hyperthread)
    - Test 2A, fast queries: time `for ($i=1..1000){ $db->query('SELECT ' . mt_rand()); }` (localhost, querying into a different container)
    - Test 2B: intermittent fast queries: same as above, but on each loop it sleeps for mt_rand(1,10e3) microseconds to perhaps trick the CPU into clocking down
    - Test 3, ApacheBenchmark command requesting a webpage that does a handful of database queries: `ab -n 500 https://lucb1e.com/`, executed from a VPS in a nearby country, taking the 95th percentile response time

Governor results:

The governor makes no measurable difference for the benchmark and serial queries (tests 1 and 2A), but in test 3 there's a very clear difference: 86-88 ms for the 95th percentile versus 92-95 ms (ran each test 3 times to see if the result is stable). CPU frequency is not always at max when the performance governor is set (I had expected it would be at max all the time then). For test 2B, I see a 3% difference (powersave being slower) but I'm not sure that's not just random variation.

Idle states results:

Disabling idle states has mixed results, basically as you describe: it makes the CPU report maximum frequency all the time, which, instead of making it faster, seems to make it throttle for thermal reasons: the benchmark suffers and gets ~20 instead of ~27 million loops per core per second, while sensors shoot up from ~50 to ~80 °C. On the other hand, it has the same effect on web requests as setting the governor to performance (but I didn't change the governor), and on test 2B it has an even bigger impact: ~11% faster.

---

I'll have to ponder this. My first thought was that my HTTP-based two-way latency measurement utility should trigger the performance governor for more reliable results, but when I test it now here on WiFi (not that VPS with a stable connection as used in the test above), the results are indistinguishable before or after the governor change; the difference must be too small compared to the variability that WiFi adds (also on 5 GHz that can stably max out the throughput). My second thought is that this might give me another stab at exploiting a timing side channel in database index lookups, where the results were just too variable and I couldn't figure out how to make my work laptop set a fixed CPU frequency (the hardware or driver doesn't support it, iirc, as far as I could find). I was also not aware that there are power states besides "everything is running" (+/- frequency changes), "stand by / suspend", and "powered off". This 2012 laptop has 6 idle states already, all with different latency values, and my current laptop 9! Lots to learn here still, and I'm sure I'll think of more implications later

I've set things back to idle states enabled and governor powersave, since everything ran great on that for years, and I expect to keep using that virtually all the time. But now that I know this, I'll certainly set it to performance to see if it helps for certain workloads (timing side channels which already work fine may become more reliable if my CPU runs more predictably). Thanks for making me aware :)

j / k navigate · click thread line to collapse

0 pointsanarazel1y ago0 comments

> Intermittent workloads on the order of 2 milliseconds, you mean?

I see regressions at periods well below 50ms, but yea, that's the shape of it.

  powersave, idle enabled: 0.392 ms
  performance, idle enabled: 0.295 ms
  performance, idle disabled: 0.163 ms

If I make that same 1 client go full tilt, instead of limiting it to 1000 QPS:

  powersave, idle enabled: 0.141 ms
  performance, idle enabled: 0.107 ms
  performance, idle disabled: 0.090 ms

I'd call that a significant performance change.

> if you have that many short requests coming in, the CPU would simply never scale back if it's reasonably constant.

On the same setup as above, if I instead have 800 client connections going full tilt, there's no meaningful difference between powersave/performance and idle enabled/disabled.

0 comments

lucb1e1y ago

Maybe for completeness, what CPU type is this on?

anarazelOP1y ago

> I will try this on my own hardware as well

FWIW, my results corresponded to:

  cpupower frequency-set --governor powersave && cpupower idle-set -E

  cpupower frequency-set --governor performance && cpupower idle-set -E

  cpupower frequency-set --governor performance && cpupower idle-set -D0

It's perhaps worth pointing out that -D0 sometimes hurts performance, by reducing the boost potential of individual cores, due to the higher baseline temp & power usage.

> Maybe for completeness, what CPU type is this on?

This was a 2x Xeon Gold 5215. But I've reproduced this on newer Intel and AMD server CPUs too.

lucb1e1y ago

Thanks, also for providing the ready-to-use commands!

I did four tests on my "server" with an Intel i7 3630QM CPU. Pseudocode:

    - Test 1, simple benchmark: `php -r '$starttime=microtime(1); while($starttime+1>microtime(1)){$loops++} print($loops);`, running in parallel for each real CPU core (not hyperthread)
    - Test 2A, fast queries: time `for ($i=1..1000){ $db->query('SELECT ' . mt_rand()); }` (localhost, querying into a different container)
    - Test 2B: intermittent fast queries: same as above, but on each loop it sleeps for mt_rand(1,10e3) microseconds to perhaps trick the CPU into clocking down
    - Test 3, ApacheBenchmark command requesting a webpage that does a handful of database queries: `ab -n 500 https://lucb1e.com/`, executed from a VPS in a nearby country, taking the 95th percentile response time

Governor results:

Idle states results:

---

j / k navigate · click thread line to collapse