Either use Depot or switch to self-hosted runners with large disks.
GitLab CI is leaps and bounds ahead.
Zoom is adding in email.
Years ago I worked for a bank. You know what happens if you set up bill pay with a bank? You're unlikely to end that relationship. Because who the fuck wants to do all that work to move.
Your labor, your suffering (cause setting up bill pay sucks) is an egress fee.
If you have GitHub acting as anything other than your public facing code repo you're locking yourself into the platform. Bug tracking, code review, CI pipelines, GitHub features that are going to keep you from moving quickly if you need to change providers.
The runner code is on GitHub, and it's not pretty. In fact last time I ran it it had trouble generating stable exit codes.
Now I don't even want to imagine what Github Actions are like…
If there's one thing I've learned over the years, is that we really seldom have advanced needs. Mostly we just want things to work a certain way, and will fight systems to make it behave so. It's easier to just leave it be. Like maven vs gradle; yes, gradle can do everything, but if you need that it's worth taking a step back and assess why the normal maven flow won't work. What's so special with our app compared to the millions working just fine out of the box?
We tried caching at several companies. Outside node builds, it was never worth it. Horray, our .Net builds took 15 seconds instead of 4 Minutes. Eventually you realized no one cared since we averaged deployments every 4 days outside of outages and time being burned by it just wasn't there.
It's not even that! Coming from GitLab I was quite surprised at how poor the "getting started" experience was. Rather than a simple "on push, run command X" you first have to do a deep dive into actions/events/workflows/jobs/runs, and then figure out what kind of weird tooling is used for trivial things like checking out your code, or storing artifacts.
And then you try to unify your pipeline across several projects because that's what Github is heavily promoting with the whole "uses: actions/checkout" reuse thing - but it turns out to be a huge hassle to get it working because nothing works the way you'd expect it to work.
In the end I did get GHA to do what I was already doing in GitLAb, but it took me ten times as long as it did originally setting it up. I believe GHA is flexible and powerful enough to be well-suited for medium-sized companies, but it's neither easy enough for small companies, nor powerful enough for large companies. It's one of the few Github features I genuinely dislike using.
There's also an action out there "GitHub cache dance" that will stick your whole buildkit state dir into the gha cache.
It's just a problem that requires big, local disks to solve.
docker/nerdctl only transfers the context, everything else is cached on the builder. it's very useful for monorepos (where you usually want to build and tag images for every tested commit)
and the builder directly pushes the images/tags/layers to the registry. (which can be just a new tag for already existing layer.)
a noop job is about 2 sec on GitLab CI this way.
Important note if you're taking advice: cache-from and cache-to both accept multiple values. Cache to just ouputs the cache data to all the ones specified. cache-from looks for cache hits in the sources in-order. You can do some clever stuff to maximize cache hits with the least amount of downloading using the right combination.
Yes, nix is complex. But its caching story is soooo much better than docker's, and all the other docker issues just disappear.
https://nix.dev/tutorials/nixos/building-and-running-docker-...
Hugo @ Namespace (https://namespace.so)
There I also explain that IF you use a registry cache import/export, you should use the same registry to which you are also pushing your actual image, and use the "image-manifest=true" option (especially if you are targeting GHCR - on DockerHub "image-manifest=true" would not be necessary).
"image-manifest=true" was the magic parameter that I needed to make this work with a non-DockerHub registry (Artifactory). I spent a lot of time fighting this, and non-obvious error messages. Thank you!!
We use a multi-stage build for a DevContainer environment, and the final image is quite large (for various reasons), so a better caching strategy really helps in our use case (smaller incremental image updates, smaller downloads for developers, less storage in the repository, etc)
What most people need but don't use is base layers that are upstream of their code repo and released regularly, not at each commit.
Containerisation has made reproducible environments so easy that people want to reproduce it at each CI run, a bit too much.
Use tools like Bazel + rules_oci or Gradle + jib and never spend time thinking about image builds taking time at all.
We had “the Bazel guy” in our mid-sized company that Bazelified so many build processes, then left.
It has been an absolute nightmare to maintain because no normal person has any experience with this tooling. It’s very esoteric. People in our company have reluctantly had to pick up Bazel tech debt tasks, like how the rules_docker package got randomly deprecated and replaced with rules_oci with a different API, which meant we could no longer update our Golang services to new versions of Go.
In the process we’ve broken CI, builds on Mac, had production outages, and all kinds of peculiarities and rollbacks needed that have been introduced because of an over-engineered esoteric build system that no one really cares about or wanted.
Also just because you don't have experience with something doesn't make it a bad choice. I would recommending understanding it first, why your coworker chose it and how other tools would actually do in the same role, grass is often greener on the other side until you get there.
Personally I went through a bit of an adventure with Bazel. My first exposure to it was similar to yours, was used in a place I didn't understand for reasons I didn't understand, broke in ways I didn't understand and (regretfully) didn't want to spend time understanding.
The reality was once I sat down to use it properly and understood the concepts a lot of things made sense and a whole bunch of very very difficult things became tractable.
That last bit is super important. Bazel raises the baseline effort to do something with the build system, which annoys people that don't want to invest time in understanding a build system. However it drastically reduces the complexity of extremely difficult things like fully byte for byte reproducible builds, extremely fast incremental builds and massive build step parallelization through remote build execution.
I only need to install utils once and all build time goes to building my software. It even integrates nicely with Github. Result: 50% faster feedback.
However, it needs a bit initial housekeeping and discipline to use correctly. For example using Jenkinsfiles is a must and using containers as agents is desirable.
Jenkins is the most flexible automation platform and its easy to do things in suboptimal ways (eg. Configuring jobs using the GUI).
There's also a way to configure Jenkins the IaC way and I am hoping to dig into that at some point. The old way requires manual work that instictly feels wrong when automating everything else.
my previous experience was that in nearly all situations the time spent sending and retrieving cache layers over the network wound up making a shorter build step moot. ultimately we said “fuck it” and focused on making builds faster without (docker layer) caching.
The registry cache idea is a neat idea, but in practice suffers the same problem.
it's unfortunate the amount of expertise / tinkering required to get "incrementalism" in docker builds in github actions. we're hoping to solve this with some of the stuff we have in the pipeline in the near future.
With this approach, you’d use buildx and remotely, they would manage and maintain cache amongst other benefits.
It does require a credit card signup (which takes $0 to mitigate fraud). Full transparency, I’m a Docker Captain and helped test this whilst it was called Hydrobuild.
They rethink Dockerfiles with really good caching support.
A possible exception is the "auto skip" feature for Earthly Cloud, since I do not know how that is implemented.
That can extend beyond just producing docker iamges as well. Under the covers the CACHE keyword is how lib/rust in Earthly makes building Rust artifacts in CI faster.
CircleCI has an implementation that used to use a detachable disk, but that had issues with concurrency
It’s since been replaced with an approach that uses a docker plugin under the hood to store layers in object storage
There's quite a bit of cruft that can be pruned.
But in the long run, as annoying as it is out build pipelines reduced but quite a few minutes per build.
Glad it worked really well for you.
What made you switch to self-hosted runners?
Maybe building from scratch all the time is a good correctness decision? Maybe stale values in disks is a tricky enough issue to want to avoid entirely?
If you keep a stack of disks around and grab a free one when the job starts you'd end up with good speedup a lot of the time. If cost is an issue you can expire them quickly. I regularly see CI jobs spending >50% of their time downloading the same things, or compiling the same things, over and over. How many times have I triggered an action that compiled the exact same sqlite source code? Tens of thousands?
Maybe this is fine, I dunno.
I remember working on a project where the first clean build would always fail, and only incremental builds could succeed. I was a junior at the time, so this was 15-20 years ago. I remember spending some time trying to get it to succeed from a clean build and my lead pulling me aside: he said it was an easy fix, but if we fixed it, the ops guys would insist on building from scratch for every build. So please, stop.
Personally, unless you have an exotic build env, it’s usually faster and easier to simply build in the runner. If you need a docker image at the end, build a dockerfile that simply copies the artifacts from disk.
We came to the same conclusion and built Depot around this exact workflow for the Docker image build problem. We're now bringing that same tech into GitHub Actions workflows.
- uses: DeterminateSystems/nix-installer-action@main
- uses: DeterminateSystems/magic-nix-cache-action@mainSo much time spent debunking such broken "caching" solutions.
Computers are very fast now. Use proper package/versioning systems (part of the problem here is that those are often also broken/badly designed).