Predictive CPU isolation of containers at Netflix (2019) (opens in new tab)

(netflixtechblog.com)

80 pointsCwizard2y ago21 comments

21 comments

It seems like we are more and more getting away from OSes managing our resources

Runtimes/vms implement memory management, varius threading techniques and things like we see here

Maybe in the future we will entirely skip OS's overhead and run apps directly on HW and they will manager themselves more efficiently (their runtimes/vms like jvm clr)

dist1ll2y ago

That's definitely a trend we're moving towards for extremely high-performance software. General-purpose operating systems often don't sit at the right level of abstraction, and lack the flexibility for certain demanding workloads. Kernel-bypass networking is the gold standard for low-latency, high throughput networking. Serverless platforms often rely on userspace schedulers and userspace page fault handlers.

That's one of the reasons unikernels seem to be a promising way forward. It opens up a bunch of opportunities, including language-based safety, opportunities for compile-time optimizations, and just seems to mirror more closely how we wish to run & deploy modern applications (declarative, immutable and ideally with a bare minimum of dependencies).

cj2y ago

Is it node that still limits all processes to 2gb or something by default? (I think their rationale was “it’s a v8 flag so we don’t touch it”)

piyh2y ago

TIL

https://nodejs.org/api/cli.html#--max-old-space-sizesize-in-...

actionfromafar2y ago

Upside is that it makes sure your stuff can be deployed on 32-bit.

lexicality2y ago

Does that come in handy often?

hinkley2y ago

And yet pointer compression is turned off by default.

yencabulator2y ago

More like "kernel programming is hard, let's put fancier logic and RPC in userspace". Which sounds perfectly sane.

mochomocha2y ago

(I'm the author of the blog post)

Beyond "kernel programming is hard", there are a few other reasons why it made sense for us:

- observability & maintenance: much easier to implement and ship this type of changes in userspace than rolling out a kernel fork. We also built custom AB infra to be able to evaluate these optimizations.

- the kernel is really good at making reasonable decisions at high-frequency based on a limited amount of data and heuristics. But these decisions are far from optimal in all scenarios. In contrast in user-space we can make better decisions based on more data (or ML predictions), but do so less frequently.

tptacek2y ago

Meh, not really? This seems more analogous to memory allocator optimization, where your libc malloc() is "optimized" to give adequate performance to all sorts of different allocation patterns, but you can do much better if you know a priori what your application's actual pattern will be. Just swap out "malloc()" with "the process scheduler" here.

Rohansi2y ago

Worth watching The Birth and Death of JavaScript for more information on this hypothetical future. In theory you could get rid of system call and virtual memory overhead to make something like JavaScript run at "fully native" speed. Because removing the OS-related overhead counters the loss for being JavaScript. This is only really a viable future for managed languages because the runtime would need to ensure memory safety, isolation between "processes", etc. which they mostly do already anyway.

https://www.destroyallsoftware.com/talks/the-birth-and-death...

exe342y ago

Could build them on top of unikernels.

gpderetta2y ago

wash, rinse, repeat.

shadowpho2y ago

This is amazing, they use ML to predict utilization on the fly

dang2y ago

Predictive CPU isolation of containers at Netflix using a MIP solver - https://news.ycombinator.com/item?id=21116565 - Sept 2019 (21 comments)

Predictive CPU Isolation of Containers at Netflix - https://news.ycombinator.com/item?id=20096699 - June 2019 (1 comment)

yencabulator2y ago

See also ghOSt by Google (2021):

https://storage.googleapis.com/pub-tools-public-publication-...

https://github.com/google/ghost-userspace

https://github.com/google/ghost-kernel

vient2y ago

Also sched-ext which seems close to be mainlined and is already a default scheduler in CachyOS:

https://github.com/sched-ext/scx

Sparkyte2y ago

Kind of an old article. It is pretty straight forward thing to do. If you spend enough time accurately load testing your environments you can dial in the container resources and shave thousands of dollars. Lots of places are too scared of under allocating. Limit and request exist for a reason. Limit is for surge and request is what is always guaranteed. It is okay to exceed your request as long as you balance add a scaling policy to balance out the surge. And be cautious with request and limit on memory not all applications benefit from this.

burutthrow12342y ago

They're automatically predicting the limit _and_ figuring out binpacking into hyperthreaded CPUs and NUMA cores. K8s just pushes your supplied values down to the kernel, which is exactly what they're saying is inefficient.

Sparkyte2y ago

It is indeed inefficient so this is more like a process lasso approach to the resource management?

hinkley2y ago

If the number of servers needed for service A is proportional to the number of servers needed for service B-Z, then your whole cluster scales up and down together and you have a situation where the max cluster size is hit regularly instead of almost never. For private servers that’s a big problem. But if you’re a large enough customer for a cloud provider it can still be a problem.

You save money still, but you don’t solve your capacity problems by doing so.

j / k navigate · click thread line to collapse

21 comments

high_na_euv2y ago

It seems like we are more and more getting away from OSes managing our resources

Runtimes/vms implement memory management, varius threading techniques and things like we see here

Maybe in the future we will entirely skip OS's overhead and run apps directly on HW and they will manager themselves more efficiently (their runtimes/vms like jvm clr)

dist1ll2y ago

cj2y ago

Is it node that still limits all processes to 2gb or something by default? (I think their rationale was “it’s a v8 flag so we don’t touch it”)

piyh2y ago

TIL

https://nodejs.org/api/cli.html#--max-old-space-sizesize-in-...

actionfromafar2y ago

Upside is that it makes sure your stuff can be deployed on 32-bit.

lexicality2y ago

Does that come in handy often?

hinkley2y ago

And yet pointer compression is turned off by default.

yencabulator2y ago

More like "kernel programming is hard, let's put fancier logic and RPC in userspace". Which sounds perfectly sane.

mochomocha2y ago

(I'm the author of the blog post)

Beyond "kernel programming is hard", there are a few other reasons why it made sense for us:

tptacek2y ago

Rohansi2y ago

https://www.destroyallsoftware.com/talks/the-birth-and-death...

exe342y ago

Could build them on top of unikernels.

gpderetta2y ago

wash, rinse, repeat.

shadowpho2y ago

This is amazing, they use ML to predict utilization on the fly

dang2y ago

Predictive CPU isolation of containers at Netflix using a MIP solver - https://news.ycombinator.com/item?id=21116565 - Sept 2019 (21 comments)

Predictive CPU Isolation of Containers at Netflix - https://news.ycombinator.com/item?id=20096699 - June 2019 (1 comment)

yencabulator2y ago