I advise pinning users to carefully measure the supposed performance improvement, as there is a tangible risk of spending time on imaginary gains.
With no pinning they'd randomly go into the milliseconds -- with pinning it would stay in the micro second range!
The result of this is games (and likely audio) performing much more favorably.
How much of this is cache coherency/in-fighting, scheduling, or simply host usage; I couldn't tell you. I was just happy to have my VM 'feel' native.
There will always be a benefit with pinning vCPUs on the same NUMA nodes as their devices (VFIO or even SR-IOV). This is becoming increasingly important on hypervisors
PMC data at scale is pretty clear: very often, CFS won't do the right thing and will leave bad HT neighbors on the same core, leading to L1 thrashing, or keep a high-level of imbalance between NUMA sockets leading to degraded LLC hit rate.
I correct my statement with "_did_ a good job", and appreciate rigorous testing.
I worked on a project where we collected detailed production runtime characteristics and evaluated scheduler algorithms against it. Tiny improvements made for massive savings.
We are running lots of Erlang on k8s and CPU pinning improves performance of Erlang schedulers tremendously.
On most of the systems I ran, we didn't tend to have much of anything running on BEAMs dirty schedulers or other OS processes. If you have more of a mix of things, leaving things unpinned may work better.
I like to keep up with several cryptocurrency prices on Coinbase, but the Coinbase Pro pages consume a pretty significant amount of CPU time. I'd love to be able to just shove all of those processes to a single CPU thread to reduce the impact on overall system performance.
I suppose it wouldn't be too hard to write a Python script that does this automatically...scan Window titles to look for "Coinbase Pro", find the owning PID, then call SetAffinity...
>"This class overlaps significantly with CS392 ``Systems Programming'' -- if you have taken this class, please talk to me in person before trying to register for CS631."[1]
Does anyone know if the videos for CS392 might also be online? I tried to some basic URL substitutions however I came up empty.
https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF#CP...
What are other real-world uses of CPU pinning?
The software will be pinned to CPU cores close to the RAM or PCIE device they are using.
Only really seen it be an issue in crazy large scale systems, or where you have 4 CPUs, but I haven't spent a huge amount of time on microsecond critical workloads.
(I'm looking at 'lstopo' from package 'hwloc', Linux on my Haswell Xeon: 10MB shared L3, 256KB L2, 32KB L1{d,i} per core)
Given my (educated) guess, I've told irqbalance to put interrupts only on 'thread 0' and then I schedule cpu-intensive tasks to 'thread 1' and schedule them very-not-nicely. Linux seems pretty good about keeping everything else on 'thread 0' when I have 'thread 1' busy so I don't do any further management.
I can have 4 cores 'thread 1' pegged at 100% with no impact on interactive or I/O performance.