Coding Agent VMs on NixOS with Microvm.nix (opens in new tab)

(michael.stapelberg.ch)

109 pointssecure1mo ago48 comments

48 comments

That is quite an involved setup to get a costly autocomplete going.

Is that really where we are at? Just outsource convenience to a few big players that can afford the hardware? Just to save on typing and god forbid…thinking?

“Sorry boss, I can’t write code because cloudflare is down.”

Cyph0n1mo ago

Keep in mind that this setup is a one-time cost. Also, a lot of the code is related to configuring it the way the author wants it (via Home Manager).

Generally speaking, once you have a working NixOS config, incremental changes become extremely trivial, safe, and easy to rollback.

aquariusDue1mo ago

To provide another data point: I too use NixOS and oh boy that one-time is really costly. And while we're sharing Nix stuff for LLMs there's this piece of kit too: https://github.com/YPares/rigup.nix

1 more reply

groby_b1mo ago

If you believe "costly autocomplete" is all you get, you absolutely shouldn't bother.

You're opting for "sorry boss, it's going to take me 10 times as long, but it's going to be loving craftsmanship, not industrial production" instead. You want different tools, for a different job.

0xcb01mo ago

I was looking for a way to isolate my agents in a more convenient way, and I really love your idea. I'm going to give this a try over the weekend and will report back.

But the one-time setup seems like a really fair investment for having a more secure development. Of course, what concerns the problem of getting malicious code to production, this will not help. But this will, with a little overhead, I think, really make development locally much more secure.

And you can automate it a lot. And it will be finally my chance to get more into NixOS :D

NJL30001mo ago

A pair of containers felt a bit cheaper than a VM:

https://github.com/5L-Labs/amp_in_a_box

I was going to add Gemini / OpenCode Kilo next.

There is some upfront cost to define what endpoints to map inside, but it definitely adds a veneer of preventing the crazy…

phrotoma1mo ago

One problem with using containers as an isolation environment for a coding assistant is that it becomes challenging to have the agent work on a containerized project. You often need some janky "docker-in-docker" nonsense that hampers efforts.

indigodaddy1mo ago

I like using LXC containers, eg full persistent OS and you can do docker if you want etc. I started this and it works well for me to put on a server or VPS:

https://github.com/jgbrwn/vibebin

NJL30001mo ago

I was planning to have worktrees bind mounted systematically, but agree it’s not super clean atm at scale (yet)

giancarlostoro1mo ago

This brings me back to my college days. We had Windows, and Deep Freeze. Students could do anything on the computer, we restart it and its all wiped and new. How long before Deep Freeze realizes they could sell their tool to Vibe Coders, they have Deep Freeze for Mac but not for Linux, funnily enough.

mxs_1mo ago

I there a way to make this work with macOS hosts, preferably without having to install a Linux toolchain inside the VM for the language the agent will be writing code in?

mtlynch1mo ago

This is a similar macOS solution:

https://github.com/lynaghk/vibe/

ghxst1mo ago

I'm working on a shared remote box for AI assisted development, will definitely look at this for some inspiration.

messh1mo ago

I use shellbox.dev to create sandboxes through ssh, without ever leaving the terminal

heliumtera1mo ago

Couldn't you replicate all of your setup with qemu microvm?

Without nix I mean

rictic1mo ago

Yep. What nix adds is a declarative and reproducible way to build customized OS images to boot into.

CuriouslyC1mo ago

Nix is the best answer to "works on my machine," which is a problem I've seen at pretty much every place I've ever worked.

1 more reply

schmuhblaster1mo ago

Or try this: https://github.com/deepclause/agentvm, it's based on container2wasm, so the VM is fully defined by a Dockerfile.

clawsyndicate1mo ago

we run ~10k agent pods on k3s and went with gvisor over microvms purely for density. the memory overhead of a dedicated kernel per tenant just doesn't scale when you're trying to pack thousands of instances onto a few nodes. strict network policies and pid limits cover most of the isolation gaps anyway.

secureOP1mo ago

Yeah, when you run ≈10k agents instead of ≈10, you need a different solution :)

I’m curious what gVisor is getting you in your setup — of course gVisor is good for running untrusted code, but would you say that gVisor prevents issues that would otherwise make the agent break out of the kubernetes pod? Like, do you have examples you’ve observed where gVisor has saved the day?

zeroxfe1mo ago

I've used both gVisor and microvms for this (at very large scales), and there are various tradeoffs between the two.

The huge gVisor drawback is that it __drastically_ slows down applications (despite startup time being faster.)

For agents, the startup time latency is less of an issue than the runtime cost, so microvms perform a lot better. If you're doing this in kube, then there's a bunch of other challenges to deal with if you want standard k8s features, but if you're just looking for isolated sandboxes for agents, microvms work really well.

clawsyndicate1mo ago

since we allow agents to execute arbitrary python, we treat every container as hostile. we've definitely seen logs of agents trying to crawl /proc or hit the k8s metadata api. gvisor intercepts those syscalls so they never actually reach the host kernel.

2 more replies

alexzenla1mo ago

This is a big reason for our strategy at Edera (https://edera.dev) of building hypervisor technology that eliminates the standard x86/ARM kernel overhead in favor of deep para-virtualization.

The performance of gVisor is often a big limiting factor in deployment.

souvik19971mo ago

Edera looks very cool! Awesome team too.

I read the thesis on arxiv. Do you see any limitations from using Xen instead of KVM? I think that was the biggest surprise for me as I have very rarely seen teams build on Xen.

1 more reply

yearolinuxdsktp1mo ago

How do you compete with Nitro-based VMs on AWS with 0.5% overhead?

1 more reply

souvik19971mo ago

Hey @clawsyndicate I'd love to learn more about your use case. We are working on a product that would potentially get you the best of both worlds (microVM security and containers/gVisor scalability). My email is in my profile.

alexzenla1mo ago

This is the thesis of our research paper here, a good middle ground is necessary: https://arxiv.org/abs/2501.04580

dist-epoch1mo ago

LXC containers inside a VM scales. bonus point that LXC containers feel like a VM.

indigodaddy1mo ago

I started this with same idea:

https://github.com/jgbrwn/vibebin

j / k navigate · click thread line to collapse

48 comments

rootnod31mo ago

That is quite an involved setup to get a costly autocomplete going.

Is that really where we are at? Just outsource convenience to a few big players that can afford the hardware? Just to save on typing and god forbid…thinking?

“Sorry boss, I can’t write code because cloudflare is down.”

Cyph0n1mo ago

Keep in mind that this setup is a one-time cost. Also, a lot of the code is related to configuring it the way the author wants it (via Home Manager).

Generally speaking, once you have a working NixOS config, incremental changes become extremely trivial, safe, and easy to rollback.

aquariusDue1mo ago

To provide another data point: I too use NixOS and oh boy that one-time is really costly. And while we're sharing Nix stuff for LLMs there's this piece of kit too: https://github.com/YPares/rigup.nix

1 more reply

groby_b1mo ago

If you believe "costly autocomplete" is all you get, you absolutely shouldn't bother.

You're opting for "sorry boss, it's going to take me 10 times as long, but it's going to be loving craftsmanship, not industrial production" instead. You want different tools, for a different job.

0xcb01mo ago

I was looking for a way to isolate my agents in a more convenient way, and I really love your idea. I'm going to give this a try over the weekend and will report back.

And you can automate it a lot. And it will be finally my chance to get more into NixOS :D

NJL30001mo ago

A pair of containers felt a bit cheaper than a VM:

https://github.com/5L-Labs/amp_in_a_box

I was going to add Gemini / OpenCode Kilo next.

There is some upfront cost to define what endpoints to map inside, but it definitely adds a veneer of preventing the crazy…

phrotoma1mo ago

indigodaddy1mo ago

I like using LXC containers, eg full persistent OS and you can do docker if you want etc. I started this and it works well for me to put on a server or VPS:

https://github.com/jgbrwn/vibebin

NJL30001mo ago

I was planning to have worktrees bind mounted systematically, but agree it’s not super clean atm at scale (yet)

giancarlostoro1mo ago

mxs_1mo ago

I there a way to make this work with macOS hosts, preferably without having to install a Linux toolchain inside the VM for the language the agent will be writing code in?

mtlynch1mo ago

This is a similar macOS solution:

https://github.com/lynaghk/vibe/

ghxst1mo ago

I'm working on a shared remote box for AI assisted development, will definitely look at this for some inspiration.

messh1mo ago

I use shellbox.dev to create sandboxes through ssh, without ever leaving the terminal

heliumtera1mo ago

Couldn't you replicate all of your setup with qemu microvm?

Without nix I mean

rictic1mo ago

Yep. What nix adds is a declarative and reproducible way to build customized OS images to boot into.

CuriouslyC1mo ago

Nix is the best answer to "works on my machine," which is a problem I've seen at pretty much every place I've ever worked.

1 more reply

schmuhblaster1mo ago

Or try this: https://github.com/deepclause/agentvm, it's based on container2wasm, so the VM is fully defined by a Dockerfile.

clawsyndicate1mo ago

secureOP1mo ago

Yeah, when you run ≈10k agents instead of ≈10, you need a different solution :)

zeroxfe1mo ago

I've used both gVisor and microvms for this (at very large scales), and there are various tradeoffs between the two.

The huge gVisor drawback is that it __drastically_ slows down applications (despite startup time being faster.)

clawsyndicate1mo ago

2 more replies

alexzenla1mo ago

This is a big reason for our strategy at Edera (https://edera.dev) of building hypervisor technology that eliminates the standard x86/ARM kernel overhead in favor of deep para-virtualization.

The performance of gVisor is often a big limiting factor in deployment.

souvik19971mo ago

Edera looks very cool! Awesome team too.

I read the thesis on arxiv. Do you see any limitations from using Xen instead of KVM? I think that was the biggest surprise for me as I have very rarely seen teams build on Xen.

1 more reply

yearolinuxdsktp1mo ago

How do you compete with Nitro-based VMs on AWS with 0.5% overhead?

1 more reply

souvik19971mo ago

alexzenla1mo ago

This is the thesis of our research paper here, a good middle ground is necessary: https://arxiv.org/abs/2501.04580

dist-epoch1mo ago

LXC containers inside a VM scales. bonus point that LXC containers feel like a VM.

indigodaddy1mo ago

I started this with same idea:

https://github.com/jgbrwn/vibebin

j / k navigate · click thread line to collapse