S3 Files (opens in new tab)

(allthingsdistributed.com)

379 pointswerner1mo ago119 comments

https://aws.amazon.com/blogs/aws/launching-s3-files-making-s...

119 comments

This is essentially S3FS using EFS (AWS's managed NFS service) as a cache layer for active data and small random accesses. Unfortunately, this also means that it comes with some of EFS's eye-watering pricing:

— All writes cost $0.06/GB, since everything is first written to the EFS cache. For write-heavy applications, this could be a dealbreaker.

— Reads hitting the cache get billed at $0.03/GB. Large reads (>128kB) get directly streamed from the underlying S3 bucket, which is free.

— Cache is charged at $0.30/GB/month. Even though everything is written to the cache (for consistency purposes), it seems like it's only used for persistent storage of small files (<128kB), so this shouldn't cost too much.

thomas_fa1mo ago

Thanks for the analysis. Interestingly when we first released our low latency s3-compatible storage (1M IOPS, p99 ~5ms)[1], a lot of people asking the same questions why we tried to bring file system semantics (atomic object/folder rename) to s3. We also got some feedback from people who really need FS sematics, and added POSIX FS support then.

aws S3FS is using normal FUSE interface, which would be super heavy due to inherent overhead of copying data back and forth between user space and kernel space, that is the initial concern when we tried to add the POSIX support for the original object storage design. Fortunately, we have found and open-sourced a perfect solution [2]: using FUSE_OVER_IO_URING + FUSE_PASSTHROUGH, we can maintain the same high-performance archtecture design of our original object storage. We'd like to come out a new blog post explain more details and reveal our performance numbers if anyone is interested with this.

[1] https://fractalbits.com/blog/why-we-built-another-object-sto...

[2] https://crates.io/crates/fractal-fuse

ktimespi1mo ago

This was my concern too. The whole point of using S3 as a file system instead of EBS / EFS (for me at least) is to minimize cost and I don't really see why I would use this instead of s3fs.

avereveard1mo ago

Probably some tradeoff at high client count or if you seek into files to read partial data

ktimespi1mo ago

s3fs can do partial reads too with range queries, I'm leaning more towards the tradeoff.

the84721mo ago

> Large reads (>128kB) get directly streamed from the underlying S3 bucket, which is free.

Always uncached? S3 has pretty bad latency.

MontyCarloHall1mo ago

The threshold at which the cache gets used is configurable, with 128kB the default. The assumption is that any read larger than the threshold will be a long sustained read, for which latency doesn't matter too much. My question is, do reads <128kB (or whatever the threshold is) from files >128kB get saved to the cache, or is it only used for files whose overall size is under the threshold? Frequent random access to large files is a textbook use case for a caching layer like this, but its cost will be substantial in this system.

the84721mo ago

NVMe read latency is in the 10-100µs range for 128kB blocks. S3 is about 100ms. That's 3-4 OOMs. The threshold where the total read duration starts to dominate latency would be somewhere in the dozens to hundreds of megabytes, not kilobytes.

3 more replies

huntaub1mo ago

I imagine (hope) that they are doing some kind of intelligent read-ahead in the frontend servers to optimize for sequential reads that would avoid this looking terrible for applications.

objectivefs1mo ago

One advantage over S3FS would be that multiple filesystem mounts would see a consistent view of the filesystem, but it looks like this advantage disappears when mixing direct bucket access with filesystem mounts. Given the famously slow small file performance of EFS it might have been better (and cheaper) to send all files to S3 and only use EFS for the metadata layer. Not having atomic rename is also going to be a problem for any use that expects a regular filesystem.

deepsun1mo ago

> directly streamed from the underlying S3 bucket, which is free.

No reads from S3 are free. All outgoing traffic from AWS is charged no matter what.

simtel201mo ago

Reads from s3 via an s3 endpoint inside a vpc to an interface inside of that vpc is not billed.

deepsun1mo ago

S3 GET operations are billed anyway.

Traffic may be free, but not the operations.

2 more replies

jamesblonde1mo ago

S3 Files was launched today without support for atomic rename. This is not something you can bolt on. Can you imagine running Claude Code on your S3 Files and it just wants to do a little house cleaning, renaming a directory and suddenly a full copy is needed for every file in that directory?

The hardest part in building a distributed filesystem is atomic rename. It's always rename. Scalable metadata file systems, like Collosus/Tectonic/ADLSv2/HopsFS, are either designed around how to make rename work at scale* or how work around it at higher levels in the stack.

* https://www.hopsworks.ai/post/scalable-metadata-the-new-bree...

thomas_fa1mo ago

Indeed this is not an easy problem. And our s3-compatible system do support the atomic rename with extended protocol in a graceful way, see the demo with our tool [1].

[1] https://github.com/fractalbits-labs/fractalbits-main/tree/ma...

jamesblonde1mo ago

We have advanced to building S3 stores with claude code - impressive:

https://github.com/fractalbits-labs/fractalbits-main/graphs/...

sb-gcs1mo ago

Hierarchical Namespace buckets in Google Cloud Storage support folder operations, including atomic folder renames.

https://docs.cloud.google.com/storage/docs/hns-overview#feat...

wbl1mo ago

"NFS provides the semantics your applications expect" is one of the funniest things I have ever read.

danudey1mo ago

Do your applications not expect any network hiccup to cause them to block indefinitely in a system call making them effectively unkillable and making the filesystem unmountable?

wbl1mo ago

Don't forget the locking semantics. That was fun and caused Sage to fail.

boulos1mo ago

Compared to roll-your-own with S3 or GCS it does :)

mememememememo1mo ago

Semanticness as a measurement.

themafia1mo ago

> we locked a bunch of our most senior engineers in a room and said we weren’t going to let them out till they had a plan that they all liked.

That's one way to do it.

> When you create or modify files, changes are aggregated and committed back to S3 roughly every 60 seconds as a single PUT. Sync runs in both directions, so when other applications modify objects in the bucket, S3 Files automatically spots those modifications and reflects them in the filesystem view automatically.

That sounds about right given the above. I have trouble seeing this as something other than a giant "hack." I already don't enjoy projecting costs for new types of S3 access patterns and I feel like has the potential to double the complication I already experience here.

Maybe I'm too frugal, but I've been in the cloud for a decade now, and I've worked very hard to prevent any "surprise" bills from showing up. This seems like a great feature; if you don't care what your AWS bill is each month.

avereveard1mo ago

There is a staggering number of user doing this with extra steps using fsx for lustre, their life greatly simplified today (unless they use gpu direct storage I guess)

themafia1mo ago

Good point. There's a wide gulf between being able to design your workflow for S3 and trying to map an existing workflow to it.

everfrustrated1mo ago

The best way to think of the architecture of this is it's EFS with a bidirectional sync to S3.

You can write into one and read out from the other and vice versa. Consistency guarantees kept within each but not between.

rdtsc1mo ago

Synchronization bits is what I was wondering about: https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-fil...

> For example, suppose you edit /mnt/s3files/report.csv through the file system. Before S3 Files synchronizes your changes back to the S3 bucket, another application uploads a new version of report.csv directly to the S3 bucket. When S3 Files detects the conflict, it moves your version of report.csv to the lost and found directory and replaces it with the version from the S3 bucket.

> The lost and found directory is located in your file system's root directory under the name .s3files-lost+found-file-system-id.

WilcoKruijer1mo ago

Mounting S3 buckets seemed like a great way to make stateless applications stateful for a while, which sounds appealing, especially for agent-like workloads. Handling conflicts like this means you really have to approach the mounted bucket as separate stateful thing. Seems like a mismatch to me.

abidlabs1mo ago

Hugging Face Buckets also recently added support for mounting Buckets as a filesystem: https://huggingface.co/changelog/hf-mount

jitl1mo ago

I wish they offered some managed bridging to local NVMe storage. AWS NVMe is super fast compared to EBS, and EBS (node-exclusive access as block device) is faster than EFS (multi-node access). I imagine this can go fast if you put some kind of further-cache-to-NVMe FS on top, but a completely vertically integrated option would be much better.

MontyCarloHall1mo ago

Since EFS is just an NFS mount, I wonder if you could do this yourself by attaching an NVMe volume to your instance and setting up something like cachefilesd on the NFS mount, pointed to the NVMe.

Would

   mkfs.ext4 /dev/nvme0n1 && \
   mount /dev/nvme0n1 /var/cache/fscache && \
   mount -t s3files -o fsc fs-0aa860d05df9afdfe:/ /home/ec2-user/s3files

work out of the box? It does for EFS. It hardly seems worth it to offer a managed service that's effectively three shell commands, but this is AWS we're talking about.

jitl1mo ago

AWS's [docs on EFS performance](https://docs.aws.amazon.com/efs/latest/ug/performance-tips.h...) say:

> Don't use the following mount options:

> - fsc – This option enables local file caching, but does not change NFS cache coherency, and does not reduce latencies.

If the S3 Files sync logic ran client-side, we could almost entirely avoid file access latency for cached files and paying for new expensive EFS disks. I already pay for a lot of NVMe disks, let me just use those!

MontyCarloHall1mo ago

>This option enables local file caching, but does not change NFS cache coherency, and does not reduce latencies.

That's true for any NFS setup, not just EFS. The benefit of local NFS caching is to speed up reads of large, immutable files, where latency is relatively negligible. I'm not sure why AWS specifically dissuades users from enabling caching, since it's not like bandwidth to an EFS volume is even in the ballpark of EBS/NVMe bandwidth.

dabinat1mo ago

The problem with using S3 as a filesystem is that it’s immutable, and that hasn’t changed with S3 Files. So if I have a large file and change 1 byte of it, or even just rename it, it needs to upload the entire file all over again. This seems most useful for read-heavy workflows of files that are small enough to fit in the cache.

wolttam1mo ago

That’s not that different than CoW filesystems - there is no rule that files must map 1:1 to objects; you can (transparently) divide a file into smaller chunks to enable more fine grained edits.

vbezhenar1mo ago

The most obvious approach seems to implement device blocks as S3 objects and use any existing file system on top of it.

yencabulator1mo ago

S3 is notoriously miserable with small objects.

otterley1mo ago

The unit of granularity for a CoW filesystem is a block, which is typically 4kB or smaller. The unit of granularity for S3 is the entire object or 5MB (minimum multipart upload size), whichever is smaller. The difference can be immense.

direwolf201mo ago

But this doesn't

jamesblonde1mo ago

Files can be immutable if you have mutable metadata - but S3 does not have mutable metadata, so you can't rename a directory without a full copy of all its contents.

Immutable files can be solved by chunking them, allowing files to be opened and appended to - we do this in HopsFS. However, random writes are typically not supported in scaleout metadata file systems - but rarely used by POSIX clients, thankfully.

aforwardslash1mo ago

Depends how you implement the fs layer on top of s3; as a quick example, I've done a couple of implementations of exactly that, where a file is chunked into multiple s3 objects; this allows for CoW semantics if required, and parallel upload/downloads; in the end it heavily depends on your use case

koolba1mo ago

If you though locking semantics over NFS were wonky, just wait till we through a remote S3 backend in the mix!

tracerbits1mo ago

The interesting part isn't the file abstraction itself, it's that this pushes the dividing line between "object store" and "filesystem" another notch toward filesystem. The absence of in-place updates was always the load-bearing wall keeping S3 cheap and durable in the way it is — if Files preserves that and only makes the read API friendlier, fine.

If they ever ship in-place writes I'd want to see what happens to the consistency model and pricing first. That's where the actual simplicity lived, not in the API surface. Half the appeal of S3 over a real filesystem was that you couldn't shoot yourself in the foot with partial overwrites.

nyc_pizzadev1mo ago

This is very close to its first official release: https://fiberfs.io/

Built in cache, CDN compatible, JSON metadata, concurrency safe and it targets all S3 compatible storage systems.

mikestorrent1mo ago

How would you compare this to Amazon's own FUSE implementation? I think it's on its 3rd major reincarnation now

harshaw1mo ago

this post will probably never be read but.. I was on the team that was trying to make the marriage of S3 and EFS work a year ago. it's a pretty hard problem. At one point we proposed this solution (which seems like a caching layer) but it got shot down for a more complex system that would have attempted to rebuild EFS on faster S3 blob storage. I left before this engineering monstrosity made significant progress, and it clearly died at some point.

Looks like they went back to a simpler solution they could deliver but with some obvious warts. good to see something get launched but the sausage making her was brutal.

The reality is that if you read https://www.allthingsdistributed.com/2026/04/s3-files-and-th..., it sounds like the great minds at S3 figured out that a caching layer was the way to go. We (EFS) fucking proposed that years ago. But we had to deal with Seattle and the S3 braintrust who didn't want to do that. I know we wrote a PRFAQ that was close to this concept probably four years ago. The political story is that EFS was taking over by S3 and the EFS folks didn't have the agency or political backing to build a more workable solution. So we wasted a shit ton of time tackling something that was never going to work and many of the tenured EFS engineers left.

pvtmert1mo ago

100% agree to this sentiment. Although Amazon/AWS seems to be overall innovative, amount of ideas and passion killed between the same meeting rooms the article describes as "innovating driving, heated conversations" are immense.

Obviously not the same, but at home I am running a Raspberry Pi with s3fs mounting my personal S3 bucket. I am exposing the same directory with /etc/exports (NFS). Which also allows me to use filesystem-caching as a bonus on the client side.

On the other hand, I should probably move out from S3 and use R2 or something...

rufo1mo ago

I don’t have much actively constructive to say, but having worked in a large engineering organization before - boy, do I feel this.

jamesblonde1mo ago

How could they release something that doesn't support atomic rename and has no prospect of supporting atomic rename? Lots of workloads will crash and burn on this layer.

gonzalohm1mo ago

I cannot 100% confirm this, but I believe AWS insisted a lot in NOT using S3 as a file system. Why the change now?

yandie1mo ago

It appears that they put an actual file system in front of S3 (AWS EFS basically) and then perform transparent syncing. The blog post discusses a lot of caveats (consistency, for example) or object namings (incosistencies are emitted as events to customers).

Having been a fan of S3 for such a long time, I'm really a fan of the design. It's a good compromise and kudos to whoever managed to push through the design.

PunchyHamster1mo ago

Because people will use it as filesystem regardless of the original intent because it is very convenient abstraction. So might as well do it in optimal and supported way I guess ?

LazyMans1mo ago

They found a way to make money on it by putting a cache in front of it. Less load for them, better performance for you. Maybe you save money, maybe you dont.

PedroBatista1mo ago

People and by people I mean architects and lead devs at big account orgs ( $$$ ) have been using S3 as a filesystem as one of the backbones of their usually wacky mega complex projects.

So there always been a pressure to AWS make it work like that. I suspect the amount of support tickets AWS receives related to "My S3 backed project is slow/fails sometimes/run into AWS limits (like the max number of buckets per account)" and "Why don't.." questions in the design phase which many times AWS people are in the room, serve as enough of a long applied pressure to overcome technical limitations of S3.

I'm not a fan of this type of "let's put a fresh coat on top of it and pretend it's something that fundamentally is not" abstractions. But I suspect here is a case of social pressure turbo charged by $$$.

munk-a1mo ago

I think it opens them up to a huge customer base of less technically apt people who just downloaded some random "S3asYourFS.exe" program but also opens them up to needing to support that functionality and field support calls from less technically apt people. I don't know if that business decision makes sense (since AWS already lacks the CS infrastructure to even deal with professional clients) but the idea that you could get everyone and their brother paying monthly fees to AWS is likely too tempting of a fruit to pass up.

jitl1mo ago

Because without significant engineering effort (see the blog post), the mismatch between object store semantics and file semantics mean you will probably Have A Bad Time. In much earlier eras of S3, there were also some implementation specifics like throughput limits based on key prefixes (that one vanished circa 2016) that made it even worse to use for hierarchical directory shapes.

jollyllama1mo ago

So that vibe-coders who don't understand s3 but have a user-level understanding of files can build stuff and pay them.

karmasimida1mo ago

This is how tech people think, but Customer still want this, so it will be built, eventually

huksley1mo ago

I was prototyping with S3 mounted as filesystem for docker volumes and evaluating solutions for that. GeeseFS cli is the fastest one, here I made a script to mount folder with it from S3 compatible storage:

https://gist.github.com/huksley/44341276d7c269f092e10784959e...

You might want to play with memory params for GeeseFS for better results

znpy1mo ago

As usual, everything except pricing is very well explained.

nvartolomei1mo ago

> changes are aggregated and committed back to S3 roughly every 60 seconds as a single PUT

Single PUT per file I assume?

LazyMans1mo ago

Based on docs, correct.

curt151mo ago

How does this compare with ZFS's object storage backend? https://news.ycombinator.com/item?id=46620673

huntaub1mo ago

Notably, this is going to manage your data in it's native format (i.e. you can actually read-write the files out of the S3 bucket as if they were actual objects, mapping 1:1 to each file). The ZFS backend is (almost certainly) a block-based format that is persisted to S3 (meaning that you cannot use it for existing data in S3, and you cannot access data written through ZFS via S3).

miguel_martin1mo ago

Dumb Q: what would happen if you used this to store a SQLite database? Would it just... work?

My guess is this would only enable a read-replica and not backups as Litestream currently does?

laurencerowe1mo ago

SQLite’s locking is not NFS safe so this would not work.

jlokier1mo ago

Technically, SQLite's locking is NFS safe, provided NFS's implementation of fcntl() locking is working correctly.

I don't know if S3 Files implements fcntl() locking or does it correctly. But if it does, I believe SQLite should work on it correctly as well.

There have been many buggy NFS locking or caching implementations historically, which is why reason SQLite recommends against using it on NFS concurrently on multiple machines: https://sqlite.org/faq.html#:~:text=But%20use,time%2E

This SO reply suggests NFSv4 is better at this: https://unix.stackexchange.com/a/432519. But caveat it with this older reply: https://unix.stackexchange.com/a/1887

To the best of my knowledge (I worked a little on this long ago), on Linux even NFSv2 has done correct fcntl() locking for decades, if all the correct services are running and the options are set appropriately and it's Linux on both the client and server. But if something is not configured as it should be, then locking or caching may not work correctly.

laurencerowe1mo ago

Thanks for the clarification. It is completely impossible for WAL mode since that uses shared memory. I must have conflated that with non-WAL mode in my mind.

From https://sqlite.org/wal.html

> All processes using a database must be on the same host computer; WAL does not work over a network filesystem. This is because WAL requires all processes to share a small amount of memory and processes on separate host machines obviously cannot share memory with each other.

miguel_martin1mo ago

thanks

Eikon1mo ago

SQLite works great with ZeroFS: https://github.com/Barre/ZeroFS

otterley1mo ago

Via which front end? It can’t be the NFS one.

mgaunard1mo ago

Zero mention of s3fs which already did this for decades.

huntaub1mo ago

This is pretty different than s3fs. s3fs is a FUSE file system that is backed by S3.

This means that all of the non-atomic operations that you might want to do on S3 (including edits to the middle of files, renames, etc) are run on the machine running S3fs. As a result, if your machine crashes, it's not clear what's going to show up in your S3 bucket or if would corrupt things.

As a result, S3fs is also slow because it means that the next stop after your machine is S3, which isn't suitable for many file-based applications.

What AWS has built here is different, using EFS as the middle layer means that there's a safe, durable place for your file system operations to go while they're being assembled in object operations. It also means that the performance should be much better than s3fs (it's talking to ssds where data is 1ms away instead of hdds where data is 30ms away).

mgaunard1mo ago

It also means that you need to pay for EFS, which is outrageously expensive, to use S3, whose whole purpose is to be cheap.

huntaub1mo ago

Of course, you don't need to, this is just a way to opt-in to getting file semantics on top of S3.

The purpose of S3 isn't to be cheap, it's to be simple.

ChocolateGod1mo ago

You can also use something like JuiceFS to make using S3 as a shared filesystem more sane, but you're moving all the metadata to a shared database.

Eikon1mo ago

Or ZeroFS which doesn’t require a 3rd party database, just a s3 bucket!

https://github.com/Barre/ZeroFS

1 more reply

luke54411mo ago

A more solid (especially when it comes to caching) solution would be appreciated.

I thought that would be their https://github.com/awslabs/mountpoint-s3 . But no mention about this one either.

S3 files does have the advantage of having a "shared" cache via EFS, but then that would probably also make the cache slower.

PunchyHamster1mo ago

I'd assume you can still have local cache in addition to that.

rowanG0771mo ago

I was thinking: "No way this has existed for decades". But the earliest I can find it existing is 2008. Strictly speaking not decades but much closer to it than I expected.

bmurphy19761mo ago

There's also https://github.com/kahing/goofys, a Go equivalent. A bit of a dead project these days.

moralestapia1mo ago

Yeah, that blog post was written as if sliced bread has been invented again.

Reading through it, I was only thinking "is this distinguished engineer TOC 2M aware that people have been doing this since forever?".

borplk1mo ago

Does anyone have solutions or suggestions for mounting a S3 bucket as a read-only filesystem? I don't need any writes.

Previously I have done a periodic script that would simply re-sync the directory which works well enough. But curious if there's anything else out there.

valyala1mo ago

This is a good alternative for those who wants storing petabytes of historical logs, metrics or traces in VictoriaLogs, VictoriaMetrics and VictoriaTraces, and wants saving 2x-4x on the persistent storage pricing (compare EBS pricing to S3 pricing).

PunchyHamster1mo ago

Eagerly awaiting on first blogpost where developers didn't read the eventually consistent part, lost the data and made some "genius" workaround with help of the LLM that got them in that spot in the first place

Maxious1mo ago

> Effective immediately, all S3 GET, PUT, and LIST operations, as well as operations that change object tags, ACLs, or metadata, are now strongly consistent.

https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-rea...

hk13371mo ago

This could be useful. We use EFS, I like the benefits but I think it’s overkill for what we need. I’ve been thinking of switching to s3 but not looking forward to completely changing how we upload and download.

up2isomorphism1mo ago

This why today’s sales pitch are often disguised as a tech blog.

dang1mo ago

Since this is the thread that got attention, I've added the announcement link to the toptext and made the title work for both.

mbana1mo ago

Werner Vogels is awesome. I first discovered about his writing when I learnt about Dynamo DB.

goekjclo1mo ago

the "under the hood uses EFS" part is the most interesting bit here

thelastgallon1mo ago

So, NFS in the cloud?

gervwyk1mo ago

any recommendations for a lambda based sftp sever setup?

Centigonal1mo ago

Terrible day for people who sloppily use filesystem vocabulary when referring to S3 objects and prefixes.

mockbolt1mo ago

One of the best

tao_oat1mo ago

ovaistariq1mo ago

TLDR: EFS as a eventually consistent cache in front of S3.

mritchie7121mo ago

tldr: this caches your S3 data in EFS.

we run datalakes using DuckLake and this sounds really useful. GCP should follow suit quickly.

hiyer1mo ago

I was thinking of using it with Duckdb as well but seems it would be of limited benefit. Parquet objects are in MBs, so they would be streamed directly from S3. With raw parquet objects, it might help with S3 listing if you have a lot of them (shave off a couple of seconds from the query). If you are already on Ducklake, Duckdb will use that for getting the list of relevant objects anyway.

wenc1mo ago

Maybe the OP is thinking of reading/writing to DuckDB native format files. Those require filesystem semantics for writing. Unfortunately, even NFS or SMB are not sufficiently FS-like for DuckDB.

Parquet is static append only, so DuckDB has no problems with those living on S3.

huntaub1mo ago

What does DuckDB need that NFS/SMB do not provide?

anentropic1mo ago

I am curious about this use case

How do you see it helping with DuckLake?

arpinum1mo ago

Latency, predicate pushdown.

Pre-compaction the recent data can be in small files, and the delete markers will also be in small files. This will bring down fetch times, while ducklake may have many of the larger blocks in memory or disk cache already.

Reading block headers for filtering is lots of small ranges, this could speed it up by 10x.

prpl1mo ago

For files up to 100kB of size, this should effectively be really close to the same price as S3 when writing (didn't check reading so much, but the writes/PUT is always much more expensive than read/GET)

Would be really useful pre-compaction and to deal with small files issue without latency penalties

DenisM1mo ago

TLDR: Eventually consistent file system view on top of s3 with read/write cache.

CrzyLngPwd1mo ago

If there is ever a post that needs a TLDR or an AI summary it is that one.

Sell the benefits.

I have around 9 TB in 21m files on S3. How does this change benefit me?

dijksterhuis1mo ago

not everything should or needs to be some article geared towards the audience's convenience, or selling something to the audience. pretty much all allthingsdistributed articles are long form articles covering highly technical systems and contain a decent whack of detail/context. in my mind, they veer closer to "computer scientist does blog posts" compared to "5 ways React can boost your page visits" listicles.

edited slightly ... i really need to turn 10 minute post delay back on.

jz-amz1mo ago

Check out the "what's new": https://aws.amazon.com/about-aws/whats-new/2026/04/amazon-s3...

j / k navigate · click thread line to collapse

119 comments

MontyCarloHall1mo ago

— All writes cost $0.06/GB, since everything is first written to the EFS cache. For write-heavy applications, this could be a dealbreaker.

— Reads hitting the cache get billed at $0.03/GB. Large reads (>128kB) get directly streamed from the underlying S3 bucket, which is free.

thomas_fa1mo ago

[1] https://fractalbits.com/blog/why-we-built-another-object-sto...

[2] https://crates.io/crates/fractal-fuse

ktimespi1mo ago

This was my concern too. The whole point of using S3 as a file system instead of EBS / EFS (for me at least) is to minimize cost and I don't really see why I would use this instead of s3fs.

avereveard1mo ago

Probably some tradeoff at high client count or if you seek into files to read partial data

ktimespi1mo ago

s3fs can do partial reads too with range queries, I'm leaning more towards the tradeoff.

the84721mo ago

> Large reads (>128kB) get directly streamed from the underlying S3 bucket, which is free.

Always uncached? S3 has pretty bad latency.

MontyCarloHall1mo ago

the84721mo ago

3 more replies

huntaub1mo ago

I imagine (hope) that they are doing some kind of intelligent read-ahead in the frontend servers to optimize for sequential reads that would avoid this looking terrible for applications.

objectivefs1mo ago

deepsun1mo ago

> directly streamed from the underlying S3 bucket, which is free.

No reads from S3 are free. All outgoing traffic from AWS is charged no matter what.

simtel201mo ago

Reads from s3 via an s3 endpoint inside a vpc to an interface inside of that vpc is not billed.

deepsun1mo ago

S3 GET operations are billed anyway.

Traffic may be free, but not the operations.

2 more replies

jamesblonde1mo ago

* https://www.hopsworks.ai/post/scalable-metadata-the-new-bree...

thomas_fa1mo ago

Indeed this is not an easy problem. And our s3-compatible system do support the atomic rename with extended protocol in a graceful way, see the demo with our tool [1].

[1] https://github.com/fractalbits-labs/fractalbits-main/tree/ma...

jamesblonde1mo ago

We have advanced to building S3 stores with claude code - impressive:

https://github.com/fractalbits-labs/fractalbits-main/graphs/...

sb-gcs1mo ago

Hierarchical Namespace buckets in Google Cloud Storage support folder operations, including atomic folder renames.

https://docs.cloud.google.com/storage/docs/hns-overview#feat...

wbl1mo ago

"NFS provides the semantics your applications expect" is one of the funniest things I have ever read.

danudey1mo ago

Do your applications not expect any network hiccup to cause them to block indefinitely in a system call making them effectively unkillable and making the filesystem unmountable?

wbl1mo ago

Don't forget the locking semantics. That was fun and caused Sage to fail.

boulos1mo ago

Compared to roll-your-own with S3 or GCS it does :)

mememememememo1mo ago

Semanticness as a measurement.

themafia1mo ago

> we locked a bunch of our most senior engineers in a room and said we weren’t going to let them out till they had a plan that they all liked.

That's one way to do it.

avereveard1mo ago

There is a staggering number of user doing this with extra steps using fsx for lustre, their life greatly simplified today (unless they use gpu direct storage I guess)

themafia1mo ago

Good point. There's a wide gulf between being able to design your workflow for S3 and trying to map an existing workflow to it.

everfrustrated1mo ago

The best way to think of the architecture of this is it's EFS with a bidirectional sync to S3.

You can write into one and read out from the other and vice versa. Consistency guarantees kept within each but not between.

rdtsc1mo ago

Synchronization bits is what I was wondering about: https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-fil...

> The lost and found directory is located in your file system's root directory under the name .s3files-lost+found-file-system-id.

WilcoKruijer1mo ago

abidlabs1mo ago

Hugging Face Buckets also recently added support for mounting Buckets as a filesystem: https://huggingface.co/changelog/hf-mount

jitl1mo ago

MontyCarloHall1mo ago

Since EFS is just an NFS mount, I wonder if you could do this yourself by attaching an NVMe volume to your instance and setting up something like cachefilesd on the NFS mount, pointed to the NVMe.

Would

   mkfs.ext4 /dev/nvme0n1 && \
   mount /dev/nvme0n1 /var/cache/fscache && \
   mount -t s3files -o fsc fs-0aa860d05df9afdfe:/ /home/ec2-user/s3files

work out of the box? It does for EFS. It hardly seems worth it to offer a managed service that's effectively three shell commands, but this is AWS we're talking about.

jitl1mo ago

AWS's [docs on EFS performance](https://docs.aws.amazon.com/efs/latest/ug/performance-tips.h...) say:

> Don't use the following mount options:

> - fsc – This option enables local file caching, but does not change NFS cache coherency, and does not reduce latencies.

MontyCarloHall1mo ago

>This option enables local file caching, but does not change NFS cache coherency, and does not reduce latencies.

dabinat1mo ago

wolttam1mo ago

That’s not that different than CoW filesystems - there is no rule that files must map 1:1 to objects; you can (transparently) divide a file into smaller chunks to enable more fine grained edits.

vbezhenar1mo ago

The most obvious approach seems to implement device blocks as S3 objects and use any existing file system on top of it.

yencabulator1mo ago

S3 is notoriously miserable with small objects.

otterley1mo ago

direwolf201mo ago

But this doesn't

jamesblonde1mo ago

Files can be immutable if you have mutable metadata - but S3 does not have mutable metadata, so you can't rename a directory without a full copy of all its contents.

aforwardslash1mo ago

koolba1mo ago

If you though locking semantics over NFS were wonky, just wait till we through a remote S3 backend in the mix!

tracerbits1mo ago

nyc_pizzadev1mo ago

This is very close to its first official release: https://fiberfs.io/

Built in cache, CDN compatible, JSON metadata, concurrency safe and it targets all S3 compatible storage systems.

mikestorrent1mo ago

How would you compare this to Amazon's own FUSE implementation? I think it's on its 3rd major reincarnation now

harshaw1mo ago

Looks like they went back to a simpler solution they could deliver but with some obvious warts. good to see something get launched but the sausage making her was brutal.

pvtmert1mo ago

On the other hand, I should probably move out from S3 and use R2 or something...

rufo1mo ago

I don’t have much actively constructive to say, but having worked in a large engineering organization before - boy, do I feel this.

jamesblonde1mo ago

How could they release something that doesn't support atomic rename and has no prospect of supporting atomic rename? Lots of workloads will crash and burn on this layer.

gonzalohm1mo ago

I cannot 100% confirm this, but I believe AWS insisted a lot in NOT using S3 as a file system. Why the change now?

yandie1mo ago

Having been a fan of S3 for such a long time, I'm really a fan of the design. It's a good compromise and kudos to whoever managed to push through the design.

PunchyHamster1mo ago

Because people will use it as filesystem regardless of the original intent because it is very convenient abstraction. So might as well do it in optimal and supported way I guess ?

LazyMans1mo ago

They found a way to make money on it by putting a cache in front of it. Less load for them, better performance for you. Maybe you save money, maybe you dont.

PedroBatista1mo ago

People and by people I mean architects and lead devs at big account orgs ( $$$ ) have been using S3 as a filesystem as one of the backbones of their usually wacky mega complex projects.

munk-a1mo ago

jitl1mo ago

jollyllama1mo ago

So that vibe-coders who don't understand s3 but have a user-level understanding of files can build stuff and pay them.

karmasimida1mo ago

This is how tech people think, but Customer still want this, so it will be built, eventually

huksley1mo ago

https://gist.github.com/huksley/44341276d7c269f092e10784959e...

You might want to play with memory params for GeeseFS for better results

znpy1mo ago

As usual, everything except pricing is very well explained.

nvartolomei1mo ago

> changes are aggregated and committed back to S3 roughly every 60 seconds as a single PUT

Single PUT per file I assume?

LazyMans1mo ago

Based on docs, correct.

curt151mo ago

How does this compare with ZFS's object storage backend? https://news.ycombinator.com/item?id=46620673

huntaub1mo ago

miguel_martin1mo ago

Dumb Q: what would happen if you used this to store a SQLite database? Would it just... work?

My guess is this would only enable a read-replica and not backups as Litestream currently does?

laurencerowe1mo ago

SQLite’s locking is not NFS safe so this would not work.

jlokier1mo ago

Technically, SQLite's locking is NFS safe, provided NFS's implementation of fcntl() locking is working correctly.

I don't know if S3 Files implements fcntl() locking or does it correctly. But if it does, I believe SQLite should work on it correctly as well.

This SO reply suggests NFSv4 is better at this: https://unix.stackexchange.com/a/432519. But caveat it with this older reply: https://unix.stackexchange.com/a/1887

laurencerowe1mo ago

Thanks for the clarification. It is completely impossible for WAL mode since that uses shared memory. I must have conflated that with non-WAL mode in my mind.

From https://sqlite.org/wal.html

miguel_martin1mo ago

thanks

Eikon1mo ago

SQLite works great with ZeroFS: https://github.com/Barre/ZeroFS

otterley1mo ago

Via which front end? It can’t be the NFS one.

mgaunard1mo ago

Zero mention of s3fs which already did this for decades.

huntaub1mo ago

This is pretty different than s3fs. s3fs is a FUSE file system that is backed by S3.

As a result, S3fs is also slow because it means that the next stop after your machine is S3, which isn't suitable for many file-based applications.

mgaunard1mo ago

It also means that you need to pay for EFS, which is outrageously expensive, to use S3, whose whole purpose is to be cheap.

huntaub1mo ago

Of course, you don't need to, this is just a way to opt-in to getting file semantics on top of S3.

The purpose of S3 isn't to be cheap, it's to be simple.

ChocolateGod1mo ago

You can also use something like JuiceFS to make using S3 as a shared filesystem more sane, but you're moving all the metadata to a shared database.

Eikon1mo ago

Or ZeroFS which doesn’t require a 3rd party database, just a s3 bucket!

https://github.com/Barre/ZeroFS

1 more reply

luke54411mo ago

A more solid (especially when it comes to caching) solution would be appreciated.

I thought that would be their https://github.com/awslabs/mountpoint-s3 . But no mention about this one either.

S3 files does have the advantage of having a "shared" cache via EFS, but then that would probably also make the cache slower.

PunchyHamster1mo ago

I'd assume you can still have local cache in addition to that.

rowanG0771mo ago

I was thinking: "No way this has existed for decades". But the earliest I can find it existing is 2008. Strictly speaking not decades but much closer to it than I expected.

bmurphy19761mo ago

There's also https://github.com/kahing/goofys, a Go equivalent. A bit of a dead project these days.

moralestapia1mo ago

Yeah, that blog post was written as if sliced bread has been invented again.

Reading through it, I was only thinking "is this distinguished engineer TOC 2M aware that people have been doing this since forever?".

borplk1mo ago

Does anyone have solutions or suggestions for mounting a S3 bucket as a read-only filesystem? I don't need any writes.

Previously I have done a periodic script that would simply re-sync the directory which works well enough. But curious if there's anything else out there.

valyala1mo ago

PunchyHamster1mo ago

Maxious1mo ago

> Effective immediately, all S3 GET, PUT, and LIST operations, as well as operations that change object tags, ACLs, or metadata, are now strongly consistent.

https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-rea...

hk13371mo ago

up2isomorphism1mo ago

This why today’s sales pitch are often disguised as a tech blog.

dang1mo ago

Since this is the thread that got attention, I've added the announcement link to the toptext and made the title work for both.

mbana1mo ago

Werner Vogels is awesome. I first discovered about his writing when I learnt about Dynamo DB.

goekjclo1mo ago

the "under the hood uses EFS" part is the most interesting bit here

thelastgallon1mo ago

So, NFS in the cloud?

gervwyk1mo ago

any recommendations for a lambda based sftp sever setup?

Centigonal1mo ago

Terrible day for people who sloppily use filesystem vocabulary when referring to S3 objects and prefixes.

mockbolt1mo ago

One of the best

tao_oat1mo ago

ovaistariq1mo ago

TLDR: EFS as a eventually consistent cache in front of S3.

mritchie7121mo ago

tldr: this caches your S3 data in EFS.

we run datalakes using DuckLake and this sounds really useful. GCP should follow suit quickly.

hiyer1mo ago

wenc1mo ago

Maybe the OP is thinking of reading/writing to DuckDB native format files. Those require filesystem semantics for writing. Unfortunately, even NFS or SMB are not sufficiently FS-like for DuckDB.

Parquet is static append only, so DuckDB has no problems with those living on S3.

huntaub1mo ago

What does DuckDB need that NFS/SMB do not provide?

anentropic1mo ago

I am curious about this use case

How do you see it helping with DuckLake?

arpinum1mo ago

Latency, predicate pushdown.

Reading block headers for filtering is lots of small ranges, this could speed it up by 10x.

prpl1mo ago

Would be really useful pre-compaction and to deal with small files issue without latency penalties

DenisM1mo ago

TLDR: Eventually consistent file system view on top of s3 with read/write cache.

CrzyLngPwd1mo ago

If there is ever a post that needs a TLDR or an AI summary it is that one.

Sell the benefits.

I have around 9 TB in 21m files on S3. How does this change benefit me?

dijksterhuis1mo ago

edited slightly ... i really need to turn 10 minute post delay back on.

jz-amz1mo ago

Check out the "what's new": https://aws.amazon.com/about-aws/whats-new/2026/04/amazon-s3...

j / k navigate · click thread line to collapse