Disk I/O bottlenecks in GitHub Actions (opens in new tab)

(depot.dev)

106 pointsjacobwg1y ago72 comments

72 comments

A list of fun things we've done for CI runners to improve CI:

- Configured a block-level in-memory disk accelerator / cache (fs operations at the speed of RAM!)

- Benchmarked EC2 instance types (m7a is the best x86 today, m8g is the best arm64)

- "Warming" the root EBS volume by accessing a set of priority blocks before the job starts to give the job full disk performance [0]

- Launching each runner instance in a public subnet with a public IP - the runner gets full throughput from AWS to the public internet, and IP-based rate limits rarely apply (Docker Hub)

- Configuring Docker with containerd/estargz support

- Just generally turning kernel options and unit files off that aren't needed

[0] https://docs.aws.amazon.com/ebs/latest/userguide/ebs-initial...

3np1y ago

> Launching each runner instance in a public subnet with a public IP - the runner gets full throughput from AWS to the public internet, and IP-based rate limits rarely apply (Docker Hub)

Are you not using a caching registry mirror, instead pulling the same image from Hub for each runner...? If so that seems like it would be an easy win to add, unless you specifically do mostly hot/unique pulls.

The more efficient answer to those rate limits is almost always to pull less times for the same work rather than scaling in a way that circumvents them.

jacobwgOP1y ago

Today we (Depot) are not, though some of our customers configure this. For the moment at least, the ephemeral public IP architecture makes it generally unnecessary from a rate-limit perspective.

From a performance / efficiency perspective, we generally recommend using ECR Public images[0], since AWS hosts mirrors of all the "Docker official" images, and throughput to ECR Public is great from inside AWS.

[0] https://gallery.ecr.aws/

glenjamin1y ago

If you’re running inside AWS us-east-1 then docker hub will give you direct S3 URLs for layer downloads (or it used to anyway)

Any pulls doing this become zero cost for docker hub

Any sort of cache you put between docker hub and your own infra would probably be S3 backed anyway, so adding another cache in between could be mostly a waste

1 more reply

philsnow1y ago

> Configured a block-level in-memory disk accelerator / cache (fs operations at the speed of RAM!)

I'm slightly old; is that the same thing as a ramdisk? https://en.wikipedia.org/wiki/RAM_drive

jacobwgOP1y ago

Exactly, a ramdisk-backed writeback cache for the root volume for Linux. For macOS we wrote a custom nbd filter to achieve the same thing.

philsnow1y ago

Forgive me, I'm not trying to be argumentative, but doesn't Linux (and presumably all modern OSes) already have a ram-backed writeback cache for filesystems? That sounds exactly like the page cache.

3 more replies

seanlaff1y ago

The ramdisk that overflows to a real disk is a cool concept that I didn't previously consider. Is this just clever use of bcache? If you have any docs about how this was set up I'd love to read them.

yencabulator1y ago

> - Configured a block-level in-memory disk accelerator / cache (fs operations at the speed of RAM!)

Everyone Linux kernel does that already. I currently have 20 GB of disk cached in RAM on this laptop.

jiocrag1y ago

have you tried Buildkite? https://buildkite.com

ValdikSS1y ago

`apt` installation could be easily sped-up with `eatmydata`: `dpkg` calls `fsync()` on all the unpacked files, which is very slow on HDDs, and `eatmydata` hacks it out.

nijave1y ago

Really if you could just disable fsync at the OS level. A bunch of other common package managers and tools also do. Docker is a big culprit

If you corrupt a CI node, whatever. Just rerun the step

wtallis1y ago

CI containers should probably run entirely from tmpfs.

nijave1y ago

Can tmpfs be backed by persistent storage? Most of the recent stuff I've worked on is a little too big to fit in memory handily. Ideally about 20GiB of scratch space for 4-8GiB of working memory would be ideal.

I've had good success with machines that have NVMe storage (especially on cloud providers) but you still are paying the cost of fsync there even if it's a lot faster

1 more reply

candiddevmike1y ago

We built EtchaOS for this use case--small, immutable, in memory variants of Fedora, Debian, Ubuntu, etc bundled with Docker. It makes a great CI runner for GitHub Actions, and plays nicely with caching:

https://etcha.dev/etchaos/

jacobwgOP1y ago

We're having some success with doing this at the block level (e.g. in-memory writeback cache).

1 more reply

formerly_proven1y ago

You can probably use a BPF return override on fsync and fdatasync and sync_file_range, considering that the main use case of that feature is syscall-level error injection.

edit: Or, even easier, just use the pre-built fail_function infrastructure (with retval = 0 instead of an error): https://docs.kernel.org/fault-injection/fault-injection.html

jacobwgOP1y ago

I'd love to experiment with that and/or flags like `noatime`, especially when CI nodes are single-use and ephemeral.

formerly_proven1y ago

noatime is irrelevant because everyone has been using relatime for ages, and updating the atime field with relatime means you're writing that block to disk anyway, since you're updating the mtime field. So no I/O saved.

3np1y ago

atime is so exotic you shouldn't need to consider disabling it experimental. I consider it legacy at this point.

chippiewill1y ago

> Docker is a big culprit

Actually in my experience with pulling very large images to run with docker it turns out that Docker doesn't really do any fsync-ing itself. The sync happens when it creates an overlayfs mount while creating a container because the overlayfs driver in the kernel does it.

A volatile flag to the kernel driver was added a while back, but I don't think Docker uses it yet https://www.redhat.com/en/blog/container-volatile-overlay-mo...

nijave1y ago

Well yeah, but indirectly through the usage of Docker, I mean.

Unpacking the Docker image tarballs can be a bit expensive--especially with things like nodejs where you have tons of tiny files

Tearing down overlayfs is a huge issue, though

kylegalbraith1y ago

This is a neat idea that we should try. We've tried the `eatmydata` thing to speed up dpkg, but the slow part wasn't the fsync portion but rather the dpkg database.

the84721y ago

No need for eatmydata, dpkg has an unsafe-io option.

Other options are to use an overlay mount with volatile or ext4 with nobarrier and writeback.

ValdikSS1y ago

unsafe-io eliminates most fsyncs, but not all of them.

Brian_K_White1y ago

"`dpkg` calls `fsync()` on all the unpacked files"

Why in the world does it do that ????

Ok I googled (kagi). Same reason anyone ever does: pure voodoo.

If you can't trust the kernel to close() then you can't trust it to fsync() or anything else either.

Kernel-level crashes, the only kind of crash that risks half-written files, are no more likely during dpkg than any other time. A bad update is the same bad update regardless, no better, no worse.

scottlamb1y ago

Let's start from the assumption that dpkg shouldn't commit to its database that package X is installed/updated until all the on-disk files reflect that. Then if the operation fails and you try again later (on next boot or whatever) it knows to check that package's state and try rolling forward.

> If you can't trust the kernel to close() then you can't trust it to fsync() or anything else either.

https://man7.org/linux/man-pages/man2/close.2.html

       A successful close does not guarantee that the data has been
       successfully saved to disk, as the kernel uses the buffer cache to
       defer writes.  Typically, filesystems do not flush buffers when a
       file is closed.  If you need to be sure that the data is
       physically stored on the underlying disk, use fsync(2).  (It will
       depend on the disk hardware at this point.)

So if you want to wait until it's been saved to disk, you have to do an fsync first. If you even just want to know if it succeeded or failed, you have to do an fsync first.

Of course none of this matters much on an ephemeral Github Actions VM. There's no "on next boot or whatever". So this is one environment where it makes sense to bypass all this careful durability work that I'd normally be totally behind. It seems reasonable enough to say it's reached the page cache, it should continue being visible in the current boot, and tomorrow will never come.

Brian_K_White1y ago

Writing to disk is none of your business unless you are the kernel itself.

1 more reply

duped1y ago

"durability" isn't voodoo. Consider if dpkg updates libc.so and then you yank the power cord before the page cache is flushed to disk, or you're on a laptop and the battery dies.

timewizard1y ago

> before the page cache is flushed to disk

And if yank the cord before the package is fully unpacked? Wouldn't that just be the same problem? Solving that problem involves simply unpacking to a temporary location first, verifying all the files were extracted correctly, and then renaming them into existence. Which actually solves both problems.

Package management is stuck in a 1990s idea of "efficiency" which is entirely unwarranted. I have more than enough hard drive space to install the distribution several times over. Stop trying to be clever.

1 more reply

Brian_K_White1y ago

Like I said.

2 more replies

fulafel1y ago

> Kernel-level crashes, the only kind of crash that risks half-written files [...]

You can get half-written files in many other circumstances, eg on power outages, storage failures, hw caused crashes, dirty shutdowns, and filesystem corruption/bugs.

(Nitpick: trusting the kernel to close() doesn't have anythign to do with this, like a sibling comment says)

Brian_K_White1y ago

A power outage or other hardware fault or kernel crash can happen at any unpredicted time, equally just before or just after any particular action, including an fsync().

Trusting close() does not mean that the data is written all the way to disk. You don't care if or when it's all the way written to disk during dpkg ops any more than at any of the other infinite seconds of run time that aren't a dpkg op.

close() just means that any other thing that expects to use that data may do so. And you can absolutely bank on that. And if you think you can't bank on that, that means you don't trust the kernel to be keeping track of file handles, and if you don't trust the kernel to close(), then why do you trust it to fsync()?

A power rug pull does not worsen this. That can happen at any time, and there is nothing special about dpkg.

hiciu1y ago

this is actually an interesting topic and it turns out kernel never made any promises about close() and data being on the disk :)

and about kernel-level crashes: yes, but you see, dpkg creates a new file on the disk, makes sure it is written correctly with fsync() and then calls rename() (or something like that) to atomically replace old file with new one.

So there is never a possibility of given file being corrupt during update.

switch0071y ago

What's your experience developing a package manager for one of the world's most popular Linux distributions?

Maybe they know something you don't ?????

inetknght1y ago

> Kernel-level crashes, the only kind of crash that risks half-written files, are no more likely during dpkg than any other time. A bad update is the same bad update regardless, no better, no worse.

Imagine this scenario; you're writing a CI pipeline:

1. You write some script to `apt-get install` blah blah

2. As soon as the script is done, your CI job finishes.

3. Your job is finished, so the VM is powered off.

4. The hypervisor hits the power button but, oops, the VM still had dirty disk cache/pending writes.

The hypervisor may immediately pull the power (chaos monkey style; developers don't have patience), in which case those writes are now corrupted. Or, it may use ACPI shutdown which then should also have an ultimate timeout before pulling power (otherwise stalled IO might prevent resources from ever being cleaned up).

If you rely on sync to occur at step 4 during the kernel to gracefully exit, how long does the kernel wait before it decides that some shutdown-timeout occurred? How long does the hypervisor wait and is it longer than the kernel would wait? Are you even sure that the VM shutdown command you're sending is the graceful one?

How would you fsync at step 3?

For step 2, perhaps you might have an exit script that calls `fsync`.

For step 1, perhaps you might call `fsync` after `apt-get install` is done.

Brian_K_White1y ago

This is like people that think they have designed their own even better encryption algorithm. Voodoo. You are not solving a problem better than the kernel and filesystem (and hypervisor in this case) has already done. If you are not writing the kernel or a driver or bootloader itself, then fsync() is not your problem or responsibility and you aren't filling any holes left by anyone else. You're just rubbing the belly of the budda statue for good luck to feel better.

1 more reply

inetknght1y ago

So... the goal is to make `apt` to be web scale?

(to be clear: my comment is sarcasm and web scale is a reference to a joke about reliability [0])

[0]: https://www.youtube.com/watch?v=b2F-DItXtZs

crmd1y ago

This is exactly the kind of content marketing I want to see. The IO bottleneck data and the fio scripts are useful to all. Then at the end a link to their product which I’d never heard of, in case you’re dealing with the issue at hand.

kylegalbraith1y ago

Thank you for the kind words. We’re always trying to share our knowledge even if Depot isn’t a good fit for everyone. I hope the scripts get some mileage!

suryao1y ago

TLDR: disk is often the bottleneck in builds. Use 'fio' to get performance of the disk.

If you want to truly speed up builds by optimizing disk performance, there are no shortcuts to physically attaching NVMe storage with high throughput and high IOPS to your compute directly.

That's what we do at WarpBuild[0] and we outperform Depot runners handily. This is because we do not use network attached disks which come with relatively higher latency. Our runners are also coupled with faster processors.

I love the Depot content team though, it does a lot of heavy lifting.

[0] https://www.warpbuild.com

miohtama1y ago

If you can afford, upgrade your CI runners on GitHub to paid offering. Highly recommend, less drinking coffee, more instant unit test results. Pay as you go.

striking1y ago

As a Depot customer, I'd say if you can afford to pay for GitHub's runners, you should pay for Depot's instead. They boot faster, run faster, are a fraction of the price. And they are lovely people who provide amazing support.

1 more reply

kylegalbraith1y ago

This is what we focus on with Depot. Faster builds across the board without breaking the bank. More time to get things done and maybe go outside earlier.

Trading Strategy looks super cool, by the way.

crohr1y ago

I'm maintaining a benchmark of various GitHub Actions providers regarding I/O speed [1]. Depot is not present because my account was blocked but would love to compare! The disk accelerator looks like a nice feature.

[1]: https://runs-on.com/benchmarks/github-actions-disk-performan...

larusso1y ago

So I had to read to the end to realize it’s a kinda infomercial. Ok fair enough. Didn’t know what depot was though.

nodesocket1y ago

I just migrated multiple ARM64 GitHub action Docker builds from my self hosted runner (Raspberry Pi in my homeland) to Blacksmith.io and I’m really impressed with the performance so far. Only downside is no Docker layer and image cache like I had on my self hosted runner, but can’t complain on the free tier.

adityamaru1y ago

Have you checked out https://docs.blacksmith.sh/docker-builds/incremental-docker-...? This should help setup a shared, persistent docker layer cache for your runners

nodesocket1y ago

Thanks for sharing. I have a custom bash script which does the docker builds currently and swapping to useblacksmith/build-push-action would take a bit of refactoring which I don't want to spend the time on now. :-)

kayson1y ago

Bummer there's no free tier. I've been bashing my head against an intermittent CI failure problem on Github runners for probably a couple years now. I think it's related to the networking stack in their runner image and the fact that I'm using docker in docker to unit test a docker firewall. While I do appreciate that someone at Github did actually look at my issue, they totally missed the point. https://github.com/actions/runner-images/issues/11786

Are there any reasonable alternatives for a really tiny FOSS project?

r3tr01y ago

we are working on a platform that let's you measure this stuff in real-time for free.

you can check us out at https://yeet.cx

we also have a anonymous guest sandbox you can play with

https://yeet.cx/play

j / k navigate · click thread line to collapse

72 comments

jacobwgOP1y ago

A list of fun things we've done for CI runners to improve CI:

- Configured a block-level in-memory disk accelerator / cache (fs operations at the speed of RAM!)

- Benchmarked EC2 instance types (m7a is the best x86 today, m8g is the best arm64)

- "Warming" the root EBS volume by accessing a set of priority blocks before the job starts to give the job full disk performance [0]

- Launching each runner instance in a public subnet with a public IP - the runner gets full throughput from AWS to the public internet, and IP-based rate limits rarely apply (Docker Hub)

- Configuring Docker with containerd/estargz support

- Just generally turning kernel options and unit files off that aren't needed

[0] https://docs.aws.amazon.com/ebs/latest/userguide/ebs-initial...

3np1y ago

> Launching each runner instance in a public subnet with a public IP - the runner gets full throughput from AWS to the public internet, and IP-based rate limits rarely apply (Docker Hub)

The more efficient answer to those rate limits is almost always to pull less times for the same work rather than scaling in a way that circumvents them.

jacobwgOP1y ago

Today we (Depot) are not, though some of our customers configure this. For the moment at least, the ephemeral public IP architecture makes it generally unnecessary from a rate-limit perspective.

[0] https://gallery.ecr.aws/

glenjamin1y ago

If you’re running inside AWS us-east-1 then docker hub will give you direct S3 URLs for layer downloads (or it used to anyway)

Any pulls doing this become zero cost for docker hub

Any sort of cache you put between docker hub and your own infra would probably be S3 backed anyway, so adding another cache in between could be mostly a waste

1 more reply

philsnow1y ago

> Configured a block-level in-memory disk accelerator / cache (fs operations at the speed of RAM!)

I'm slightly old; is that the same thing as a ramdisk? https://en.wikipedia.org/wiki/RAM_drive

jacobwgOP1y ago

Exactly, a ramdisk-backed writeback cache for the root volume for Linux. For macOS we wrote a custom nbd filter to achieve the same thing.

philsnow1y ago

Forgive me, I'm not trying to be argumentative, but doesn't Linux (and presumably all modern OSes) already have a ram-backed writeback cache for filesystems? That sounds exactly like the page cache.

3 more replies

seanlaff1y ago

The ramdisk that overflows to a real disk is a cool concept that I didn't previously consider. Is this just clever use of bcache? If you have any docs about how this was set up I'd love to read them.

yencabulator1y ago

> - Configured a block-level in-memory disk accelerator / cache (fs operations at the speed of RAM!)

Everyone Linux kernel does that already. I currently have 20 GB of disk cached in RAM on this laptop.

jiocrag1y ago

have you tried Buildkite? https://buildkite.com

ValdikSS1y ago

`apt` installation could be easily sped-up with `eatmydata`: `dpkg` calls `fsync()` on all the unpacked files, which is very slow on HDDs, and `eatmydata` hacks it out.

nijave1y ago

Really if you could just disable fsync at the OS level. A bunch of other common package managers and tools also do. Docker is a big culprit

If you corrupt a CI node, whatever. Just rerun the step

wtallis1y ago

CI containers should probably run entirely from tmpfs.

nijave1y ago

I've had good success with machines that have NVMe storage (especially on cloud providers) but you still are paying the cost of fsync there even if it's a lot faster

1 more reply

candiddevmike1y ago

https://etcha.dev/etchaos/

jacobwgOP1y ago

We're having some success with doing this at the block level (e.g. in-memory writeback cache).

1 more reply

formerly_proven1y ago

You can probably use a BPF return override on fsync and fdatasync and sync_file_range, considering that the main use case of that feature is syscall-level error injection.

edit: Or, even easier, just use the pre-built fail_function infrastructure (with retval = 0 instead of an error): https://docs.kernel.org/fault-injection/fault-injection.html

jacobwgOP1y ago

I'd love to experiment with that and/or flags like `noatime`, especially when CI nodes are single-use and ephemeral.

formerly_proven1y ago

3np1y ago

atime is so exotic you shouldn't need to consider disabling it experimental. I consider it legacy at this point.

chippiewill1y ago

> Docker is a big culprit

A volatile flag to the kernel driver was added a while back, but I don't think Docker uses it yet https://www.redhat.com/en/blog/container-volatile-overlay-mo...

nijave1y ago

Well yeah, but indirectly through the usage of Docker, I mean.

Unpacking the Docker image tarballs can be a bit expensive--especially with things like nodejs where you have tons of tiny files

Tearing down overlayfs is a huge issue, though

kylegalbraith1y ago

This is a neat idea that we should try. We've tried the `eatmydata` thing to speed up dpkg, but the slow part wasn't the fsync portion but rather the dpkg database.

the84721y ago

No need for eatmydata, dpkg has an unsafe-io option.

Other options are to use an overlay mount with volatile or ext4 with nobarrier and writeback.

ValdikSS1y ago

unsafe-io eliminates most fsyncs, but not all of them.

Brian_K_White1y ago

"`dpkg` calls `fsync()` on all the unpacked files"

Why in the world does it do that ????

Ok I googled (kagi). Same reason anyone ever does: pure voodoo.

If you can't trust the kernel to close() then you can't trust it to fsync() or anything else either.

Kernel-level crashes, the only kind of crash that risks half-written files, are no more likely during dpkg than any other time. A bad update is the same bad update regardless, no better, no worse.

scottlamb1y ago

> If you can't trust the kernel to close() then you can't trust it to fsync() or anything else either.

https://man7.org/linux/man-pages/man2/close.2.html

       A successful close does not guarantee that the data has been
       successfully saved to disk, as the kernel uses the buffer cache to
       defer writes.  Typically, filesystems do not flush buffers when a
       file is closed.  If you need to be sure that the data is
       physically stored on the underlying disk, use fsync(2).  (It will
       depend on the disk hardware at this point.)

So if you want to wait until it's been saved to disk, you have to do an fsync first. If you even just want to know if it succeeded or failed, you have to do an fsync first.

Brian_K_White1y ago

Writing to disk is none of your business unless you are the kernel itself.

1 more reply

duped1y ago

"durability" isn't voodoo. Consider if dpkg updates libc.so and then you yank the power cord before the page cache is flushed to disk, or you're on a laptop and the battery dies.

timewizard1y ago

> before the page cache is flushed to disk

1 more reply

Brian_K_White1y ago

Like I said.

2 more replies

fulafel1y ago

> Kernel-level crashes, the only kind of crash that risks half-written files [...]

You can get half-written files in many other circumstances, eg on power outages, storage failures, hw caused crashes, dirty shutdowns, and filesystem corruption/bugs.

(Nitpick: trusting the kernel to close() doesn't have anythign to do with this, like a sibling comment says)

Brian_K_White1y ago

A power outage or other hardware fault or kernel crash can happen at any unpredicted time, equally just before or just after any particular action, including an fsync().

A power rug pull does not worsen this. That can happen at any time, and there is nothing special about dpkg.

hiciu1y ago

this is actually an interesting topic and it turns out kernel never made any promises about close() and data being on the disk :)

So there is never a possibility of given file being corrupt during update.

switch0071y ago

What's your experience developing a package manager for one of the world's most popular Linux distributions?

Maybe they know something you don't ?????

inetknght1y ago

Imagine this scenario; you're writing a CI pipeline:

1. You write some script to `apt-get install` blah blah

2. As soon as the script is done, your CI job finishes.

3. Your job is finished, so the VM is powered off.

4. The hypervisor hits the power button but, oops, the VM still had dirty disk cache/pending writes.

How would you fsync at step 3?

For step 2, perhaps you might have an exit script that calls `fsync`.

For step 1, perhaps you might call `fsync` after `apt-get install` is done.

Brian_K_White1y ago

1 more reply

inetknght1y ago

So... the goal is to make `apt` to be web scale?

(to be clear: my comment is sarcasm and web scale is a reference to a joke about reliability [0])

[0]: https://www.youtube.com/watch?v=b2F-DItXtZs

crmd1y ago

kylegalbraith1y ago

Thank you for the kind words. We’re always trying to share our knowledge even if Depot isn’t a good fit for everyone. I hope the scripts get some mileage!

suryao1y ago

TLDR: disk is often the bottleneck in builds. Use 'fio' to get performance of the disk.

If you want to truly speed up builds by optimizing disk performance, there are no shortcuts to physically attaching NVMe storage with high throughput and high IOPS to your compute directly.

I love the Depot content team though, it does a lot of heavy lifting.

[0] https://www.warpbuild.com

miohtama1y ago

If you can afford, upgrade your CI runners on GitHub to paid offering. Highly recommend, less drinking coffee, more instant unit test results. Pay as you go.

striking1y ago

1 more reply

kylegalbraith1y ago

This is what we focus on with Depot. Faster builds across the board without breaking the bank. More time to get things done and maybe go outside earlier.

Trading Strategy looks super cool, by the way.

crohr1y ago

[1]: https://runs-on.com/benchmarks/github-actions-disk-performan...

larusso1y ago

So I had to read to the end to realize it’s a kinda infomercial. Ok fair enough. Didn’t know what depot was though.

nodesocket1y ago

adityamaru1y ago

Have you checked out https://docs.blacksmith.sh/docker-builds/incremental-docker-...? This should help setup a shared, persistent docker layer cache for your runners

nodesocket1y ago

kayson1y ago

Are there any reasonable alternatives for a really tiny FOSS project?

r3tr01y ago

we are working on a platform that let's you measure this stuff in real-time for free.

you can check us out at https://yeet.cx

we also have a anonymous guest sandbox you can play with

https://yeet.cx/play

j / k navigate · click thread line to collapse