S3 Express Is All You Need (opens in new tab)

(warpstream.com)

163 pointsryanworl2y ago85 comments

85 comments

Most production storage systems/databases built on top of S3 spend a significant amount of effort building an SSD/memory caching tier to make them performant enough for production (e.g. on top of RocksDB). But it's not easy to keep it in sync with blob...

Even with the cache, the cold query latency lower-bound to S3 is subject to ~50ms roundtrips [0]. To build a performant system, you have to tightly control roundtrips. S3 Express changes that equation dramatically, as S3 Express approaches HDD random read speeds (single-digit ms), so we can build production systems that don't need an SSD cache—just the zero-copy, deserialized in-memory cache.

Many systems will probably continue to have an SSD cache (~100 us random reads), but now MVPs can be built without it, and cold query latency goes down dramatically. That's a big deal

We're currently building a vector database on top of object storage, so this is extremely timely for us... I hope GCS ships this ASAP. [1]

[0]: https://github.com/sirupsen/napkin-math [1]: https://turbopuffer.com/

jamesblonde2y ago

We built HopsFS-S3 [0] for exactly this problem, and have running it as part of Hopsworks now for a number of years. It's a network-aware, write-through cache for S3 with a HDFS API. Metadata operations are performed on HopsFS, so you don't have the other problems list max listing operations return 1000 files/dirs.

NVMe is what is changing the equation, not SSD. NVMe disks now have up to 8 GB/s, although the crap in the cloud providers barely goes to 2 GB/s - and only for expensive instances. So, instead of 40X better throughput than S3, we can get like 10X. Right now, these workloads are much better on-premises on the cheapest m.2 NVMe disks ($200 for 4TB with 4 GB/s read/write) backed by a S3 object store like Scality.

[0] https://www.hopsworks.ai/post/faster-than-aws-s3

dekhn2y ago

the numbers you're giving are throughput (byte/sec) not latency.

The comment you reply to is talking mostly about latency - reporting that S3 object get latencies (time to open the object and return its head) in the single-digits ms, where S3 was 50ms before.

BTW EBS can do 4GB/sec per volume. But you will pay for it.

kernelsanderz2y ago

Very excited about being able to build scalable vector databases on DiskANN like turbopuffer or lancedb. These changes in latency are game changing. The best server is no server. The capability a low latency vector database application that runs in lambda and S3 and is dirt cheap is pretty amazing.

seasily2y ago

The clear use case is serverless—without the complications of DynamoDB (expensive, 0.03/GB read), DynamoDB+DAX (VPC complications), or Redis (again, VPC requirements).

This instantly makes a number of applications able to run directly on S3, sans any caching system.

__turbobrew__2y ago

Similar to vector databases this could be really useful for hosting cloud optimized geotiffs for mapping purposes. At a previous job we were able to do on the fly tiling in about 100ms but with this new storage class you could probably make something that could tile just as fast or even faster than arcgis with all of its proprietary optimizations and goop.

Take it a step further, gdal supports s3 raster data sources out of the box for a while now. Any gdal powered system may be able to operate on s3 files as if they are local.

promocha2y ago

> “Of course the AWS S3 Express storage costs are still 8x higher than S3 standard, but that’s a non issue for any modern data storage system. Data can be trivially landed into low latency S3 Express buckets, and then compacted out to S3 Standard buckets asynchronously. Most modern data systems already have a form of compaction anyways, so this “storage tiering” is effectively free.”

This is key insight. The data storage cost essentially becomes negligible and latency goes down by a magnitude by making S3 Express as a buffer storage then moving data to standard S3. I see a future where most data-intensive apps would use S3 as main storage layer.

otabdeveloper42y ago

Did you conveniently ignore egress costs?

tomjakubowski2y ago

sounds a bit like CPU caches and main memory

haimez2y ago

Or like SSD’s vs spinning disks…

francoismassot2y ago

We tested S3 Express for our search engine quickwit [0] a couple of weeks ago.

While this was really satisfying on the performance side, we were a bit disappointed by the price, and I mostly agree with the article on this matter.

I can see some very specific use cases where the pricing should be OK but currently, I would say most of our users will just stay on the classic S3 and add some local SSD caching if they have a lot of requests.

[0] https://github.com/quickwit-oss/quickwit/

kernelsanderz2y ago

I'd be fascinated if you could share your insights from using this. Where does the pricing fall down? And is the latency/throughput a big improvement for this use case? (ie. externalizing a search index).

fulmicoton2y ago

I ran the benchmark at Quickwit. I confirm it works as intended. I was extremely excited about this feature, primarily interested in the decreased GET request cost, and secondly the lower latency.

Unfortunately the price model puts it in a place where it is the right technology only for some very rare places.

In a nutshell the key thing you need to know is: - The storage is 6.4x expensive than classic S3. - The GET requests are 2x cheaper (with additional cost for large requests). - Your data is replicated within a single region. - latency is single digit ms.

From a pure cost wise point of view, the realm where it makes sense to use it is there, but small, and often competes more with EBS than it competes with S3.

kernelsanderz2y ago

thank so much for sharing. Amazing product BTW!

Hixon102y ago

Did you have access to their private preview version, or it is some another S3 Express, not S3 Express One Zone?

emgeee2y ago

some additional context here is that warpstream is building a Kakfa compatible streaming system that uses s3 as the object store. This allows them to leverage cheap zone transfer costs for redundancy + automatic storage tiering to cut down on the costs of running and maintaining these systems. This has previously come at the cost of latency due to s3's read/write speeds but with S3 this makes them more competitive with Confluent Kafka's managed offerings for these latency sensitive applications.

IMO warpstream is a really cool product and this new S3 offering makes them even better

refset2y ago

I am eager to hear how it will affect their latency numbers:

> Engineering is about trade-offs, and we’ve made a significant one with WarpStream: latency. The current implementation has a P99 of ~400ms for Produce requests because we never acknowledge data until it has been durably persisted in S3 and committed to our cloud control plane. In addition, our current P99 latency of data end-to-end from producer-to-consumer is around 1s

via https://www.warpstream.com/blog/kafka-is-dead-long-live-kafk...

fswd2y ago

I solved this problem locally. When uploading a file to the server before going to S3 it is cached in redis. Whenever the codebase needs to use the file, it checks redis, and if it is not there it fetches it and caches it again.

jamiesonbecker2y ago

Exactly. Write-through cache is exactly how Userify[0] used to work for self-hosted versions. (when it was Python, we used Redis to keep state synced across multiple processes, but now that it's a Go app, we do all the caching and state management in memory using Ristretto[1])

However, we now install by default to local disk filesystem, since it's much faster to just do a periodic S3 hot sync, like with restic or aws-cli, than to treat S3 as the primary backing store, or just version the EBS or instance volume. The other reason you might want to use S3 as a primary is if you use a lot of disk, but our files are compressed and extremely small, even for a large installation with tens of thousands of users and instances.

0. https://userify.com (ssh key management + sudo for teams)

1. https://github.com/dgraph-io/ristretto

avinassh2y ago

What were the reasons to move from Redis to Ristretto? Both seem to be very different, since Redis is distributed where as Ristretto is local to the process.

jamiesonbecker2y ago

In our case, Python (because of the GIL) required us to have a single python process per core in order to take advantage of multiple cores, and so we needed Redis to maintain a unified memory state across all the cores, but Go can automatically span across multiple cores.

We also saw about a 10x speedup by moving all caching into the server process, and since it was all in the same process, we no longer had to compress and encrypt data before sending to Redis. We still checkpoint the moving server state, encrypted and compressed, to disk every sixty seconds, just like Redis would do with BGSAVE, so we can start back up within a few seconds (actually faster than the old Redis after a restart.)

edmundsauto2y ago

So you store the actual image data in redis? That's interesting, no issues with storing binary data?

I ask because I've always been taught to not store files in a database. This use case sounds like an interesting exception.

nojvek2y ago

Files are just a bunch of bytes. No harm in putting them in a database.

There were some benchmarks, I couldn’t fine where SQLite was faster than native file system at retrieving, searching and adding files to a large directory.

avinassh2y ago

SQLite reads and writes small blobs (for example, thumbnail images) 35% faster¹ than the same blobs can be read from or written to individual files on disk using fread() or fwrite().

Furthermore, a single SQLite database holding 10-kilobyte blobs uses about 20% less disk space than storing the blobs in individual files.

The performance difference arises (we believe) because when working from an SQLite database, the open() and close() system calls are invoked only once, whereas open() and close() are invoked once for each blob when using blobs stored in individual files. It appears that the overhead of calling open() and close() is greater than the overhead of using the database. The size reduction arises from the fact that individual files are padded out to the next multiple of the filesystem block size, whereas the blobs are packed more tightly into an SQLite database.

https://www.sqlite.org/fasterthanfs.html

throwitaway2222y ago

I don't understand why EFS never gets major shout outs - it's way better than S3: systems can mount it as a drive, shared across systems, already has had super low latency... Not sure what s3 express is really useful for if EFS already exists.

candiddevmike2y ago

EFS is really expensive and has terrible latency with small files in my experience

dekhn2y ago

When you set up EFS did you maximize the IO settings?

Before doing that it was unacceptably slow. After doing that it was unacceptably expensive.

huntaub2y ago

Note that EFS One Zone is priced the same as S3 Express One Zone with similar latency. One isn't better or worse than the other, it only depends on what kind of access your application needs.

brazzledazzle2y ago

Yeah the main reason is that it's incredibly expensive. You can improve performance by allocating ahead of time but NFS has never been at its best when working with a bunch of tiny files.

richieartoul2y ago

Do you have any more details you can share about the performance of EFS? I've never met anyone who has actually used it in anger.

a2tech2y ago

Yes, I built a moderately large system on it that used lots of small shared files. The performance was fairly terrible. There's weird little niggles with it--we had random slowdowns, throughput issues, and things just didn't work quite right.

It was an ok solution for what we were doing, but several times I came really close to just dumping it and standing up an NFS server using EBS volumes.

I also used it a couple of times to store webroots and that was a complete disaster with systems that had lots of small files (Drupal I'm looking at you).

gchamonlive2y ago

Throughput scales with the amount of data in it, it is in the docs. So depending on the application, even if latency is better, the speeds are atrocious at lower volumes of persisted data.

1 more reply

toomuchtodo2y ago

EFS exists if you don't care much about spend and performance while having to forklift a POSIX compliant use case into AWS for persistent data.

a2tech2y ago

Thats basically how we were using it. It could have been worse.

notatoad2y ago

i'm sure there's some cases where mounting storage as a disk is desirable, but from my perspective "systems can mount it as a drive" is a negative, not a positive.

treating storage as an application-controlled thing that doesn't need systems management is a good thing. i want "put this file in this spot" logic in my application code, not "put this file in this spot on the filesystem, and hope that location is backed by the correct storage layer"

yeeeloit2y ago

I wonder if Mountpoint for S3 along with this new Express option makes it a direct competitor to EFS for some use cases.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/mountp...

halz2y ago

Looks like support for S3 Express was merged in with version 1.30 just a few hours ago https://github.com/awslabs/mountpoint-s3/pull/642

tneely2y ago

I'm quite curious about this too - both from a cost and performance perspective. If S3 Express is close enough to EFS on these metrics, then I'd say it wins out due to the sheer ubiquity and portability of S3 these days.

sparrc2y ago

In my experience the biggest drawback with EFS is startup time for systems that mount it in.

For example a container or EC2 instance might only need a tiny bit of your storage and with s3 can just download what it needs when it needs it.

As opposed to EFS where the container or instance needs to load in the entire datastore on startup which can add minutes to startup time if the EFS drive is large.

dpedu2y ago

My understanding is that EFS is exposed as an NFS share. I haven't used it personally, but NFS mounting is generally fast, nearly instant. What does "load in the entire datastore" mean?

ericpauley2y ago

EFS mounting is definitely nearly instant. I use it constantly.

dekhn2y ago

Many servers start up, load a ton of data from storage into RAM, and then happily serve that data for a long time. The latency of the server when starting up before it can service its first request is entirely based on the throughput of the data load.

Often these servers will load 128+GB of data into RAM (crazy, huh?) and even if you have 1GB/sec it's still two minutes for the server to start up.

1 more reply

osti2y ago

If I'm not wrong, this is the low latency S3 that is written in Rust. Finally launched after years in the making.

throwaway9342232y ago

S3 Standard has slow first byte latency for three reasons: 1. All data is stored on old school spinning HDDs with multi millisecond seek times 2. There's still Java (and garbage collection) on the request path. There has been a multi-year effort to move the request path entirely to Rust to eliminate GC but Java still remains. 3. To reduce storage costs, objects are erasure-coded "wide", which means many hosts are involved in servicing a request. This means only one such sub-request has to be slow to slow your request down.

The new storage class is SSD-backed, presumably doesn't use Java anywhere, and doesn't stripe your data across as many hosts. It's more expensive because SSDs are more expensive than HDDs and narrow erasure codes are more costly than wide erasure codes.

Source: Used to work on S3.

miraculixx2y ago

Minio?

paulddraper2y ago

Surely being written in a non-Rust language is not responsible for an extra 40ms of latency, right?

Or is rust really that magic?

osti2y ago

Of course not, it's designed differently from the original S3. AWS came out with this to compete with Azure premium blob storage, which has very good first byte latency, and Azure had it 4 years ago..

https://azure.microsoft.com/en-us/blog/premium-block-blob-st...

estebarb2y ago

ShardStore? (More info: https://www.thestack.technology/aws-shardstore-s3/ ) it seems that it was deployed years ago.

osti2y ago

Nah, that looks very different, one of the stated goals of S3 Express is to minimize latency, which is the only thing about the Rust S3 that I remember.

FridgeSeal2y ago

Do you have any sources for that? Very interested to know more about this.

osti2y ago

Unfortunately I don't, this is already internal information that I don't know if I should say here. I never worked on S3 and I no longer work at AWS so someone from within would have to weigh in.

kristianp2y ago

I saw "X is all you Need" with the "Attention is all you need" paper [1], which launched the Transformer upon the world. Is it the first instance of that phrase?

[1] https://arxiv.org/abs/1706.03762

cafard2y ago

The Beatles might claim prior art.

kristianp2y ago

Yes, I just realised and started hearing "all you need is love".

warpspin2y ago

It starts to get more annoying than "X considered harmful".

KTibow2y ago

There was a SIGBOVIK paper on this

BonoboIO2y ago

Has anyone here a usecase which would perform better with this new S3 Express Tier?

And a second question, would it be worth the 8x times surcharge?

parhamn2y ago

I think the key benefit brushed on by this article is the potential 10x improvement in access speeds (which has many applications, beyond reducing your s3 op charges).

> S3 Express One Zone can improve data access speeds by 10x and reduce request costs by 50% compared to S3 Standard and scales to process millions of requests per minute.

haimez2y ago

10x reduction in latency, higher storage costs with lower access costs (SSD instead of spinning disks). So high I/O, small files situations (with no need for cross AZ access) are where the benefits can be found.

maccard2y ago

I'm going to set up sccache [0] to use it tomorrow. We use MSVC, so EFS is off the cards.

[0] https://github.com/mozilla/sccache/blob/main/docs/S3.md

barsandtones2y ago

This will work great with the s3 mount point that AWS recently released. This will outperform EFS if your application does not require full POSIX compatibility.

paulddraper2y ago

A cache with large blobs (images, etc)

awoimbee2y ago

If it's only a cache it should be on EBS, which is still way faster and 2x less expensive. I started a migration to s3 for such a project (container image caching) but then stopped when I realized what I was doing.

rbranson2y ago

EBS attaches a single block storage volume to a single host[1]. S3 Express is a service-based object store. Apples and oranges.

[1] Yes, I am aware of multi-attach but this introduces a scaling bottleneck and requires a fairly exotic setup.

paulddraper2y ago

1. You'd need an access/authentication layer on top of that.

2. Variable throughput may be a concern.

3. You may have availability concerns.

YetAnotherNick2y ago

Yes, EBS is the gold standard but managing a EBS to scale up and down instantly, be available to multiple instances, lifecycle management, managing replica, switchover etc. are definitely not easy. And EBS are bad choice when throughput needed is very spiky.

mgaunard2y ago

Many S3 implementations appear to simply be transparent downloads to disk rather than a true "use the network as a disk".

tjoff2y ago

> However, the new storage class does open up an exciting new opportunity for all modern data infrastructure: the ability to tune an individual workload for low latency and higher cost or higher latency and lower cost with the exact same architecture and code.

I get it, but at the same time that is also what you lost when you locked yourself in with a particular vendor.

influx2y ago

There's not much to the S3 API, and data import/export even at massive scale is available with Snowball. Sure, there's many other AWS services that aren't available at other vendors, but blob storage is commodified at this point.

amarshall2y ago

Exporting data from S3 is ludicrously expensive, even with Snowball it’s $30/TB just for network egress.

Aeolun2y ago

If I can get all of my data out of S3 for $30, then whatever the cost is for an enterprise with 1000TB, it must be within reason.

imheretolearn2y ago

> I get it, but at the same time that is also what you lost when you locked yourself in with a particular vendor.

What are other viable practical alternative solution(s)?

toomuchtodo2y ago

Storage adapter to talk S3 compatible to target, assuming you're not relying on vendor specific extensions or behavior (ie this).

Off the top of my head, Backblaze B2, Cloudflare R2, etc are S3 compatible, and Minio locally.

https://www.google.com/search?q=s3+compatible

anamexis2y ago

There are no vendor specific extensions or behavior here, are there? Isn't it just a different billing structure?

2 more replies

Spooky232y ago

I used to run one on-prem from DDN. Another good one is Nutanix. There are many out there.

If you have a big use case and you really understand your needs, it's very doable.

throwawaaarrgh2y ago

You can use a different vendor any time, it's all S3 compatible. You just don't get the same performance and billing.

paulddraper2y ago

Except this has uniform billing, security, locality, monitoring, tools, etc

tjoff2y ago

I did mention vendor lock in?

api2y ago

“All You Need Considered Harmful” - most cliche title?

collinc7772y ago

Will this improve running sqlite on s3?

j / k navigate · click thread line to collapse

85 comments

Sirupsen2y ago

Many systems will probably continue to have an SSD cache (~100 us random reads), but now MVPs can be built without it, and cold query latency goes down dramatically. That's a big deal

We're currently building a vector database on top of object storage, so this is extremely timely for us... I hope GCS ships this ASAP. [1]

[0]: https://github.com/sirupsen/napkin-math [1]: https://turbopuffer.com/

jamesblonde2y ago

[0] https://www.hopsworks.ai/post/faster-than-aws-s3

dekhn2y ago

the numbers you're giving are throughput (byte/sec) not latency.

The comment you reply to is talking mostly about latency - reporting that S3 object get latencies (time to open the object and return its head) in the single-digits ms, where S3 was 50ms before.

BTW EBS can do 4GB/sec per volume. But you will pay for it.

kernelsanderz2y ago

seasily2y ago

The clear use case is serverless—without the complications of DynamoDB (expensive, 0.03/GB read), DynamoDB+DAX (VPC complications), or Redis (again, VPC requirements).

This instantly makes a number of applications able to run directly on S3, sans any caching system.

__turbobrew__2y ago

Take it a step further, gdal supports s3 raster data sources out of the box for a while now. Any gdal powered system may be able to operate on s3 files as if they are local.

promocha2y ago

otabdeveloper42y ago

Did you conveniently ignore egress costs?

tomjakubowski2y ago

sounds a bit like CPU caches and main memory

haimez2y ago

Or like SSD’s vs spinning disks…

francoismassot2y ago

We tested S3 Express for our search engine quickwit [0] a couple of weeks ago.

While this was really satisfying on the performance side, we were a bit disappointed by the price, and I mostly agree with the article on this matter.

[0] https://github.com/quickwit-oss/quickwit/

kernelsanderz2y ago

fulmicoton2y ago

I ran the benchmark at Quickwit. I confirm it works as intended. I was extremely excited about this feature, primarily interested in the decreased GET request cost, and secondly the lower latency.

Unfortunately the price model puts it in a place where it is the right technology only for some very rare places.

From a pure cost wise point of view, the realm where it makes sense to use it is there, but small, and often competes more with EBS than it competes with S3.

kernelsanderz2y ago

thank so much for sharing. Amazing product BTW!

Hixon102y ago

Did you have access to their private preview version, or it is some another S3 Express, not S3 Express One Zone?

emgeee2y ago

IMO warpstream is a really cool product and this new S3 offering makes them even better

refset2y ago

I am eager to hear how it will affect their latency numbers:

via https://www.warpstream.com/blog/kafka-is-dead-long-live-kafk...

fswd2y ago

jamiesonbecker2y ago

0. https://userify.com (ssh key management + sudo for teams)

1. https://github.com/dgraph-io/ristretto

avinassh2y ago

What were the reasons to move from Redis to Ristretto? Both seem to be very different, since Redis is distributed where as Ristretto is local to the process.

jamiesonbecker2y ago

edmundsauto2y ago

So you store the actual image data in redis? That's interesting, no issues with storing binary data?

I ask because I've always been taught to not store files in a database. This use case sounds like an interesting exception.

nojvek2y ago

Files are just a bunch of bytes. No harm in putting them in a database.

There were some benchmarks, I couldn’t fine where SQLite was faster than native file system at retrieving, searching and adding files to a large directory.

avinassh2y ago

SQLite reads and writes small blobs (for example, thumbnail images) 35% faster¹ than the same blobs can be read from or written to individual files on disk using fread() or fwrite().

Furthermore, a single SQLite database holding 10-kilobyte blobs uses about 20% less disk space than storing the blobs in individual files.

https://www.sqlite.org/fasterthanfs.html

throwitaway2222y ago

candiddevmike2y ago

EFS is really expensive and has terrible latency with small files in my experience

dekhn2y ago

When you set up EFS did you maximize the IO settings?

Before doing that it was unacceptably slow. After doing that it was unacceptably expensive.

huntaub2y ago

Note that EFS One Zone is priced the same as S3 Express One Zone with similar latency. One isn't better or worse than the other, it only depends on what kind of access your application needs.

brazzledazzle2y ago

Yeah the main reason is that it's incredibly expensive. You can improve performance by allocating ahead of time but NFS has never been at its best when working with a bunch of tiny files.

richieartoul2y ago

Do you have any more details you can share about the performance of EFS? I've never met anyone who has actually used it in anger.

a2tech2y ago

It was an ok solution for what we were doing, but several times I came really close to just dumping it and standing up an NFS server using EBS volumes.

I also used it a couple of times to store webroots and that was a complete disaster with systems that had lots of small files (Drupal I'm looking at you).

gchamonlive2y ago

Throughput scales with the amount of data in it, it is in the docs. So depending on the application, even if latency is better, the speeds are atrocious at lower volumes of persisted data.

1 more reply

toomuchtodo2y ago

EFS exists if you don't care much about spend and performance while having to forklift a POSIX compliant use case into AWS for persistent data.

a2tech2y ago

Thats basically how we were using it. It could have been worse.

notatoad2y ago

i'm sure there's some cases where mounting storage as a disk is desirable, but from my perspective "systems can mount it as a drive" is a negative, not a positive.

yeeeloit2y ago

I wonder if Mountpoint for S3 along with this new Express option makes it a direct competitor to EFS for some use cases.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/mountp...

halz2y ago

Looks like support for S3 Express was merged in with version 1.30 just a few hours ago https://github.com/awslabs/mountpoint-s3/pull/642

tneely2y ago

sparrc2y ago

In my experience the biggest drawback with EFS is startup time for systems that mount it in.

For example a container or EC2 instance might only need a tiny bit of your storage and with s3 can just download what it needs when it needs it.

As opposed to EFS where the container or instance needs to load in the entire datastore on startup which can add minutes to startup time if the EFS drive is large.

dpedu2y ago

My understanding is that EFS is exposed as an NFS share. I haven't used it personally, but NFS mounting is generally fast, nearly instant. What does "load in the entire datastore" mean?

ericpauley2y ago

EFS mounting is definitely nearly instant. I use it constantly.

dekhn2y ago

Often these servers will load 128+GB of data into RAM (crazy, huh?) and even if you have 1GB/sec it's still two minutes for the server to start up.

1 more reply

osti2y ago

If I'm not wrong, this is the low latency S3 that is written in Rust. Finally launched after years in the making.

throwaway9342232y ago

Source: Used to work on S3.

miraculixx2y ago

Minio?

paulddraper2y ago

Surely being written in a non-Rust language is not responsible for an extra 40ms of latency, right?

Or is rust really that magic?

osti2y ago

https://azure.microsoft.com/en-us/blog/premium-block-blob-st...

estebarb2y ago

ShardStore? (More info: https://www.thestack.technology/aws-shardstore-s3/ ) it seems that it was deployed years ago.

osti2y ago

Nah, that looks very different, one of the stated goals of S3 Express is to minimize latency, which is the only thing about the Rust S3 that I remember.

FridgeSeal2y ago

Do you have any sources for that? Very interested to know more about this.

osti2y ago

Unfortunately I don't, this is already internal information that I don't know if I should say here. I never worked on S3 and I no longer work at AWS so someone from within would have to weigh in.

kristianp2y ago

I saw "X is all you Need" with the "Attention is all you need" paper [1], which launched the Transformer upon the world. Is it the first instance of that phrase?

[1] https://arxiv.org/abs/1706.03762

cafard2y ago

The Beatles might claim prior art.

kristianp2y ago

Yes, I just realised and started hearing "all you need is love".

warpspin2y ago

It starts to get more annoying than "X considered harmful".

KTibow2y ago

There was a SIGBOVIK paper on this

BonoboIO2y ago

Has anyone here a usecase which would perform better with this new S3 Express Tier?

And a second question, would it be worth the 8x times surcharge?

parhamn2y ago

I think the key benefit brushed on by this article is the potential 10x improvement in access speeds (which has many applications, beyond reducing your s3 op charges).

> S3 Express One Zone can improve data access speeds by 10x and reduce request costs by 50% compared to S3 Standard and scales to process millions of requests per minute.

haimez2y ago

maccard2y ago

I'm going to set up sccache [0] to use it tomorrow. We use MSVC, so EFS is off the cards.

[0] https://github.com/mozilla/sccache/blob/main/docs/S3.md

barsandtones2y ago

This will work great with the s3 mount point that AWS recently released. This will outperform EFS if your application does not require full POSIX compatibility.

paulddraper2y ago

A cache with large blobs (images, etc)

awoimbee2y ago

rbranson2y ago

EBS attaches a single block storage volume to a single host[1]. S3 Express is a service-based object store. Apples and oranges.

[1] Yes, I am aware of multi-attach but this introduces a scaling bottleneck and requires a fairly exotic setup.

paulddraper2y ago

1. You'd need an access/authentication layer on top of that.

2. Variable throughput may be a concern.

3. You may have availability concerns.

YetAnotherNick2y ago

mgaunard2y ago

Many S3 implementations appear to simply be transparent downloads to disk rather than a true "use the network as a disk".

tjoff2y ago

I get it, but at the same time that is also what you lost when you locked yourself in with a particular vendor.

influx2y ago

amarshall2y ago

Exporting data from S3 is ludicrously expensive, even with Snowball it’s $30/TB just for network egress.

Aeolun2y ago

If I can get all of my data out of S3 for $30, then whatever the cost is for an enterprise with 1000TB, it must be within reason.

imheretolearn2y ago

> I get it, but at the same time that is also what you lost when you locked yourself in with a particular vendor.

What are other viable practical alternative solution(s)?

toomuchtodo2y ago

Storage adapter to talk S3 compatible to target, assuming you're not relying on vendor specific extensions or behavior (ie this).

Off the top of my head, Backblaze B2, Cloudflare R2, etc are S3 compatible, and Minio locally.

https://www.google.com/search?q=s3+compatible

anamexis2y ago

There are no vendor specific extensions or behavior here, are there? Isn't it just a different billing structure?

2 more replies

Spooky232y ago

I used to run one on-prem from DDN. Another good one is Nutanix. There are many out there.

If you have a big use case and you really understand your needs, it's very doable.

throwawaaarrgh2y ago

You can use a different vendor any time, it's all S3 compatible. You just don't get the same performance and billing.

paulddraper2y ago

Except this has uniform billing, security, locality, monitoring, tools, etc

tjoff2y ago

I did mention vendor lock in?

api2y ago

“All You Need Considered Harmful” - most cliche title?

collinc7772y ago

Will this improve running sqlite on s3?

j / k navigate · click thread line to collapse