Faster Docker builds using a remote BuildKit instance (opens in new tab)

(blacksmith.sh)

75 pointsadityamaru1y ago47 comments

47 comments

This is fairly similar in concept to what we do over at depot.dev https://depot.dev/blog/depot-magic-explained

We've found that BuildKit has several inefficiencies preventing it from being as fast as it could be in the cloud, especially when dealing with simultaneous builds (common in CI). That led us to create our own optimized fork of BuildKit.

The number of fine-tuning knobs you can turn running a self-hosted BuildKit instance is limitless, but I also encourage everyone to try it as a fantastic learning exercise.

astroalex1y ago

Love what you're doing at Depot, keep up the good work!

TechSquidTV1y ago

Thank you!

rtpg1y ago

I really am hopeful we come a bit full circle on builders and machines to "we buy one or two very expensive machines that run CI and builds". Caching in particular is just sitting there, waiting to be properly captured, instead of constantly churning on various machines.

Of course, CI SaaSes implement a lot of caching on their end, but they also try to put people on the most anemic machines possible to try and capture those juicy margins.

aayushshah151y ago

> we buy one or two very expensive machines that run CI and builds

This unfortunately does not work for orgs that have, say, more than 20 engineers. The core issue is that once you have a test suite large enough to have ~30 shards, you only need one engineer `git push`ing once to saturate those 1-2 expensive machines you've got sitting in the office.

The CI workload is quite amenable to "serverless" when you get to a large enough org size, where most of the time you actually want to pay nothing (i.e. outside your business hours) but when your engineers are pushing code, you want 1500 vCPUs on-demand to run 4 or 5 test suites concurrently.

rtpg1y ago

Sounds like somebody should set up incremental CI with Bazel :)

Seriously though, of course there's a lot of details here, but I think people tend to not really interenalize how much testing is about confidence, and things like incremental CI can really chew away at how big/small your test suite needs to be. There are some things that are just inherently slow, but I've seen a lot of test suites that are mostly rerunning tests that only use unchanged code for most of its runtime.

My glib assertion is that there is likely to be no test suite generated by 20 engineers that requires 30 shards that is impossible to chop up with incremental CI. And downstream of that, getting incremental CI would improve DX a lot, cuz I bet those 30 shards take a long time

1 more reply

BonoboIO1y ago

I can get a 48 core/96 threads dedicated server for 200€ a month on hetzner. The cheapest EC2 with that comes close costs 2€ per hour. I can get nearly 10 hetzner servers that run consistently for that price.

Obviously the dedicated machines are not rentable per hour, but the cloud is so much more expensive.

2 more replies

zerotolerance1y ago

There are a few tragedies in the Docker story, but at least two are specifically tied to naming things. First, Swarm (mode) because by the time they released Swarm (mode) the world had already taken a collective dump on Swarm (the proof-of-concept). Even in 2024 most of the time people talk about Swarm and start dumping on it they're actually talking about the proof-of-concept architecture. Second, they should never have called the subcommand "build." It isn't building anything. In this case "build" is performing a packaging step with very raw tools. But the minute they called it build people started literally building software INSIDE intermediate container layers on the way to assembling a packaged container. Dockerfile is about as weak of a build tool as you could possibly ask for. Zero useful features with respect to building software. But Docker named it "build" and now we've got Dockerfile calling compilation steps, test commands, and dependency retrieval steps.

Noumenon721y ago

What is the alternative? Like one pipeline step running Gradle or make or something and then copying the result into your container that's basically an apt-get and nothing else?

imtringued1y ago

I don't see how this is a tragedy. You're blowing something trivial out of proportion.

When you build alpine packages, you literally have to call abuild on your APKBUILD files. It's the same for Arch Linux. The files are called PKGBUILD. So even if you decide to package your applications (uh, using docker run? that changes nothing!) before docker build and then install them with the OS package manager, you will run into exactly the same thing.

jahewson1y ago

I can just imagine someone inside Docker, inc. saying “the name is not important” or over and over again until one day they ship it. They’ll never know what they missed out on.

pxc1y ago

I'm in the process of rolling out something analogous at work, where Nix jobs run inside rootless Podman containers but the Nix store and Nix daemon socket are passed through from the host, so the jobs' dependencies all persist, dependencies shared between projects are stored only once, when two concurrent jobs ask for the same dependency they both just wait for the dependency to be fetched once, etc.

We also currently have some jobs that build OCI images via the Docker/Podman CLI amd build using traditional Dockerfile/Containerfile scripts. For now those are centralized and run on just one host, on bare metal. I'd like to get those working via rootless Docker-in-Docker/Podman-in-Podman, but one thing that will be a little annoying with that is that we won't have any persistent caching at the Docker/Podman layer anymore. I suppose we'll end up using something like what's in the article to get that cache persistence back.

adityamaruOP1y ago

> Nix jobs run inside rootless Podman containers but the Nix store and Nix daemon socket are passed through from the host

That's a neat idea, was the primary motivation for building this out the perf gains on the table?

pxc1y ago

More or less! Before that we basically had two kinds of jobs: one where Docker provided the job's dependencies and one where Nix provided the job's dependencies. The former was, naturally, always containerized, but the latter ran on bare metal. (We also had some jobs that used Nix without a persistent /nix/store, but never mind that.) Given both options, choosing Nix over Docker is really nice in two ways: (1) you don't need to deal with any infrastructure (i.e., push an image somewhere) for your job; and (2) even so, the second run and later can be really, really fast. We also use Nix to provide dependencies on our local machines for some projects, so just reusing that same environment in CI is a natural fit, too.

But as we started to mature our own CI 'infrastructure' (the automation we use to set up our self-hosted runners), I wanted to containerize the Nix builds. Using 'shell executors' in GitLab just feels icky to me, like a step backwards into Jenkins hell. Those jobs do leave a little bit more behind on disk. More importantly, though, while all of my team's Nix jobs use Nix in an ephemeral way, it is possible to run `nix profile install ...` in one of these bare metal jobs. That could potentially affect other such jobs, plus it creates a 'garbage collector root' that will reduce how much `nix-collect-garbage` can clean up a little bit. Our jobs are ones we'd like other teams across the company to run, and so we also want to provide some really low-effort ways for them to do so, namely: via shared infrastructure we host, via any Docker-capable runners they might already have, and by leveraging the same IaC we use to stand up our own runners.

To that end, we really want to have just one type of job that requires just one type of execution environment, and we definitely want opt-in persistence instead of a mess where jobs can very easily influence one another by accident or malice. But we don't want to lose the speedup! The real action in these jobs is small, so by sharing a persistent Nix store between runs, they go down from 2-10 minutes to 2-10 seconds, which is the kind of UX we want for our internal customers.

The new Nix image is more suitable for all three target scenarios: it's less risky on runner hosts shared by multiple teams, it still works normally (downloading deps via Nix on every run) on 'naive' Docker/Podman setups, and our runner initialization script actually uses Nix to provide Docker and Podman (both rootless), so any team can use it on top of whatever VM images they're already using for their CI runners regardless of distro or version once they're ready to opt into that performance optimization.

aliasxneo1y ago

This is basically what we do, except we use Earthly[1]. An Earthly satellite is basically a modified remote Docker Buildkit instance.

[1]: https://earthly.dev/

AkihiroSuda1y ago

> endpoint: tcp://${{ secrets.BUILDKIT_HOST }}:9999

This should be protected with mTLS (https://docs.docker.com/build/drivers/remote/) or SSH (`endpoint: ssh://user@host`) to avoid potential cryptomining attack, etc.

adityamaruOP1y ago

indeed, thats a good callout. We'll add this to our README over at https://github.com/useblacksmith/remote-buildkit-terraform

bhouston1y ago

30 minute docker builds? Crazy.

I know it is out of style for some, but my microservice architecture, which has a dozen services, each takes about 1:30m to build, maybe 2m at most (if there is a slow Next.js build in there and a few thousand npm packages), and that is just on a 4 core GitHub Actions worker.

My microservices all build and deploy in parallel so this system doesn't get slower as you expand to more services.

(Open source template which shows how it works: https://github.com/bhouston/template-typescript-monorepo/act... )

throw19480123091y ago

> My microservices all build and deploy in parallel so this system doesn't get slower as you expand to more services.

If you're deploying all your "microservices" in parallel, then what you might have built is a distributed monolith.

A microservice can be tested and deployed independently.

nine_k1y ago

I don't see a contradiction. I read it that the microservices are independent and thus can build in parallel, if several teams work on changes to several microservices.

Spinning a build worker outright when a change us pushed is the fastest way, and may be expensive if the build process is prolonged.

OTOH I've seen much faster image build times, with smart reuse of layers, so that you don't have to re-run that huge npm install if your packages.lock did not change.

bhouston1y ago

Whether it is a distributed monolith or a set of microservices is independent of the speed of build.

adityamaruOP1y ago

> 30 minute docker builds?

At Blacksmith we do see this pretty often! Rust services in particular are the most common offender.

jeffparsons1y ago

I'm working on an ungodly pile of hacks (https://github.com/jeffparsons/hope) to help with this. Coming Soon™: S3 backend and better tests.

1 more reply

bhouston1y ago

Yikes. I would be so much less productive with a 30 minute build time.

jjayj1y ago

I regularly build images where we install Python from source, which makes 30m seem quite normal...

bhouston1y ago

Why?

2 more replies

spankalee1y ago

Google's Cloud Build has always worked very well for me for remote builds, but it'd be nice if BuildKit works as consistent service interface so it's easy to switch between build backend providers.

suryao1y ago

This is pretty cool - provides a good speed up for container builds. The couple of beefy instances can set you back $200-1000 a month on aws apart from the regular github action runner costs and it only goes up from there. We have a way around that plus effective scaling for multiple parallel builds with WarpBuild.

As a side note: In my time running a CI infra co, we see that a majority of the workflow time for large teams comes from tests - which can have over 200 shards in some cases.

crohr1y ago

Another option is to simply cache layers in a fast cache close by (e.g. S3) ? Like https://runs-on.com/features/s3-cache-for-github-actions/#us...?

xyst1y ago

> At Blacksmith, we regularly see our customer’s Docker builds taking 30 minutes or more

What’s the most common cause of builds taking this long in the first place…

Worst I have ever had was 5 minutes, but subsequent builds were reduced to under a minute due to build cache, creating multi-stage builds, and keeping the layers thin and optimizing the .dockerignore

zerotolerance1y ago

People doing all the work of dependency fetches, code builds, and test execution inside ephemeral environments never designed for building software within.

imtringued1y ago

Fetching packages isn't the problem. The problem is the lack of "out of the box" caching pf the downloaded packages. You'll have to do that yourself with artifactory and Docker does not nudge you towards doing that, at all.

kapilvt1y ago

Multiarch via qemu

adityamaruOP1y ago

This is a common cause yeah but becoming less of an issue with increasing support for ARM runners.

delduca1y ago

I prefer to use a VPS

adityamaruOP1y ago

What provider do you prefer? We're big fans of Hetzner

selcuka1y ago

Well, EC2 is a type of VPS.

pxc1y ago

'Cloud VPS' is generally a lot more expensive than the cheapest old-school VPS providers— sometimes like 10x, similar to the numbers commenters elsewhere were discussing for dedicated servers vs. EC2.

j / k navigate · click thread line to collapse

47 comments

TechSquidTV1y ago

This is fairly similar in concept to what we do over at depot.dev https://depot.dev/blog/depot-magic-explained

The number of fine-tuning knobs you can turn running a self-hosted BuildKit instance is limitless, but I also encourage everyone to try it as a fantastic learning exercise.

astroalex1y ago

Love what you're doing at Depot, keep up the good work!

TechSquidTV1y ago

Thank you!

rtpg1y ago

Of course, CI SaaSes implement a lot of caching on their end, but they also try to put people on the most anemic machines possible to try and capture those juicy margins.

aayushshah151y ago

> we buy one or two very expensive machines that run CI and builds

rtpg1y ago

Sounds like somebody should set up incremental CI with Bazel :)

1 more reply

BonoboIO1y ago

Obviously the dedicated machines are not rentable per hour, but the cloud is so much more expensive.

2 more replies

zerotolerance1y ago

Noumenon721y ago

What is the alternative? Like one pipeline step running Gradle or make or something and then copying the result into your container that's basically an apt-get and nothing else?

imtringued1y ago

I don't see how this is a tragedy. You're blowing something trivial out of proportion.

jahewson1y ago

I can just imagine someone inside Docker, inc. saying “the name is not important” or over and over again until one day they ship it. They’ll never know what they missed out on.

pxc1y ago

adityamaruOP1y ago

> Nix jobs run inside rootless Podman containers but the Nix store and Nix daemon socket are passed through from the host

That's a neat idea, was the primary motivation for building this out the perf gains on the table?

pxc1y ago

aliasxneo1y ago

This is basically what we do, except we use Earthly[1]. An Earthly satellite is basically a modified remote Docker Buildkit instance.

[1]: https://earthly.dev/

AkihiroSuda1y ago

> endpoint: tcp://${{ secrets.BUILDKIT_HOST }}:9999

This should be protected with mTLS (https://docs.docker.com/build/drivers/remote/) or SSH (`endpoint: ssh://user@host`) to avoid potential cryptomining attack, etc.

adityamaruOP1y ago

indeed, thats a good callout. We'll add this to our README over at https://github.com/useblacksmith/remote-buildkit-terraform

bhouston1y ago

30 minute docker builds? Crazy.

My microservices all build and deploy in parallel so this system doesn't get slower as you expand to more services.

(Open source template which shows how it works: https://github.com/bhouston/template-typescript-monorepo/act... )

throw19480123091y ago

> My microservices all build and deploy in parallel so this system doesn't get slower as you expand to more services.

If you're deploying all your "microservices" in parallel, then what you might have built is a distributed monolith.

A microservice can be tested and deployed independently.

nine_k1y ago

I don't see a contradiction. I read it that the microservices are independent and thus can build in parallel, if several teams work on changes to several microservices.

Spinning a build worker outright when a change us pushed is the fastest way, and may be expensive if the build process is prolonged.

OTOH I've seen much faster image build times, with smart reuse of layers, so that you don't have to re-run that huge npm install if your packages.lock did not change.

bhouston1y ago

Whether it is a distributed monolith or a set of microservices is independent of the speed of build.

adityamaruOP1y ago

> 30 minute docker builds?

At Blacksmith we do see this pretty often! Rust services in particular are the most common offender.

jeffparsons1y ago

I'm working on an ungodly pile of hacks (https://github.com/jeffparsons/hope) to help with this. Coming Soon™: S3 backend and better tests.

1 more reply

bhouston1y ago

Yikes. I would be so much less productive with a 30 minute build time.

jjayj1y ago

I regularly build images where we install Python from source, which makes 30m seem quite normal...

bhouston1y ago

Why?

2 more replies

spankalee1y ago

Google's Cloud Build has always worked very well for me for remote builds, but it'd be nice if BuildKit works as consistent service interface so it's easy to switch between build backend providers.

suryao1y ago

As a side note: In my time running a CI infra co, we see that a majority of the workflow time for large teams comes from tests - which can have over 200 shards in some cases.

crohr1y ago

Another option is to simply cache layers in a fast cache close by (e.g. S3) ? Like https://runs-on.com/features/s3-cache-for-github-actions/#us...?

xyst1y ago

> At Blacksmith, we regularly see our customer’s Docker builds taking 30 minutes or more

What’s the most common cause of builds taking this long in the first place…

Worst I have ever had was 5 minutes, but subsequent builds were reduced to under a minute due to build cache, creating multi-stage builds, and keeping the layers thin and optimizing the .dockerignore

zerotolerance1y ago

People doing all the work of dependency fetches, code builds, and test execution inside ephemeral environments never designed for building software within.

imtringued1y ago

kapilvt1y ago

Multiarch via qemu

adityamaruOP1y ago

This is a common cause yeah but becoming less of an issue with increasing support for ARM runners.

delduca1y ago

I prefer to use a VPS

adityamaruOP1y ago

What provider do you prefer? We're big fans of Hetzner

selcuka1y ago

Well, EC2 is a type of VPS.

pxc1y ago

j / k navigate · click thread line to collapse