My other favorite pattern is implementing a pool of workers by quering ec2 instances with a certain tag in a stopped state and starting them. Starting the instance can succeed only once - that means I managed to snatch the machine. If it fails, I try again, grabbing another one.
This is one of those things that I never advertised out of professional shame, but it works, its bulletproof and dead simple and does not require additional infra to work.
Cost-wise we're only paying for the EBS volumes for the stopped instances which are like 4GB each, so they cost practically nothing, we spend less than a dollar per month for the whole bunch.
the first one is probably cleaner (though I don't like it, it means that I need the instance to be a kubernetes node, and that comes with a bunch of baggage).
My biggest wishlist item for S3 is the ability to enforce that an object is named with a name that matches its hash. (With a modern hash considered secure, not MD5 or SHA1, though it isn't supported for those either.) That would make it much easier to build content-addressible storage.
The client sends the request headers (including the x-amz-content-sha256 header) to the signer, and the signer responds with a valid S3 PUT request (minus body). The client takes the signer's response, appends its chosen request payload, and uploads it to S3. With such a system, you can implement a signer in a lambda function, and the lambda function enforces the content-addressed invariant.
Unfortunately it doesn't work natively with multipart: while SigV4+S3 enables you to enforce the SHA256 of each individual part, you can't enforce the SHA256 of the entire object. If you really want, you can invent your own tree hashing format atop SHA256, and enforce content-addressability on that.
I have a blog post [1] that goes into more depth on signers in general.
[1] https://josnyder.com/blog/2024/patterns_in_s3_data_access.ht...
https://aws.amazon.com/blogs/aws/new-additional-checksum-alg...
And even if it was for the whole file, it isn't used for the ETag, so, so it can't be used for conditional PUTs.
I had a use case where this looked really promising, then I ran into the multipart upload limitations, and ended up using my own custom metadata for the sha256sum.
Individual objects are split into multiple blocks, each of which can be stored independently on different underlying servers. Each can see its own block, but not any other block.
Calculating a hash like SHA256 would require a sequential scan through all blocks. This could be done with a minimum of network traffic if instead of streaming the bytes to a central server to hash, the hash state is forwarded from block server to block server in sequence. Still though, it would be a very slow serial operation that could be fairly chatty too if there are many tiny blocks.
What could work would be to use a Merkle tree hash construction where some of subdivision boundaries match the block sizes.
I'd like to set IAM permissions for a role, so that that role can add objects to the content-addressible store, but only if their name matches the hash of their content.
> Or are you saying you want S3 to automatically set the name for you based on the hash?
I'm happy to name the files myself, if I can get S3 to enforce that. But sure, if it were easier, I'd be thrilled to have S3 name the files by hash, and/or support retrieving files by hash.
It's no longer top comment, which is fine.
Genuinely, we've wanted this for ages and we got half way there with strong consistency.
As a (horribly inefficient, in case of non-trivial write contention) toy example, you could use S3 as a lock-free concurrent SQLite storage backend: Reads work as expected by fetching the entire database and satisfying the operation locally; writes work like this:
- Download the current database copy
- Perform your write locally
- Upload it back using "Put-If-Match" and the pre-edit copy as the matched object.
- If you get success, consider the transaction successful.
- If you get failure, go back to step 1 and try again.
Without conditional writes, two instances of your application might both read "100", both subtract 1, and both write "99". If they checked the file afterward, both would think everything was fine. But things aren't find because you've actually sold two.
The other cloud storage providers have had these sorts of conditional write features since basically forever, and it's always been really weird that S3 has lacked them.
[1]: https://learn.microsoft.com/en-us/azure/storage/blobs/concur...
So coordinating writes to multiple objects still requires… creativity.
I'm thinking of a situation in which an application assumes that different (possibly adversarial) user-provided data will always generate a different ETag.
Too bad performance would be terrible without a caching layer (ebs).
Can we have this Google?
…
Please?
2. glue jobs to partition by some columns reporting uses
3. query with athena
4. ???
5. profit (celebrate reduced cost)
This thing costs couple $ a month for ~500gb of data. Snowflake wanted crazy amounts of money for the same thing.
I wouldn't be surprised if they saw over 100mil/req/sec globally by now. That's 100 million requests a second that need strong read-your-write consistency and atomicity at global scale. The number of pieces they had to move into place for this to happen is probably quite the engineering tale.
[1] https://aws.amazon.com/blogs/aws/amazon-s3-two-trillion-obje...
This is especially useful in scenarios where multiple users or processes are working on the same data, as it helps maintain consistency and avoids accidental overwrites.
This is using the same mechanism as HTTP's `If-None-Match` header so it's easier to implement/learn
Finally we can have this with s3 :)