Diving Deep on S3 Consistency (opens in new tab)

(allthingsdistributed.com)

126 pointsthemarkers5y ago49 comments

49 comments

swyx5y ago

i find this very light on the actual "diving deep" part promised in the title. theres a lot of self congratulatory chest thumping, not a lot of technical detail. Werner of course doesnt owe us any explanation whatsoever. i just dont find this particularly deep.

killingtime745y ago

The higher you go the more your job is marketing

rossmohax5y ago

Recent S3 consistency improvements are welcome, but S3 still falls behind Google GCS until they support conditional PUTs.

GCS allows object to be replaced conditionally with `x-goog-if-generation-match` header, which sometimes can be quite useful.

ignoramous5y ago

Vogels spoke briefly about why AWS prefers versioned objects instead here: https://queue.acm.org/detail.cfm?id=3434573

BTW, DynamoDB supports conditional PUTs if your data can fit under 400 KiB.

CodesInChaos5y ago

How do versioned objects make conditional puts unnecessary? I see little relation between them, except that you could use the version identifier in the condition.

1 more reply

skynet-90005y ago

Why would AWS provide a feature that makes additional transaction and storage charges (as well as subsequent reads to see which is the correct version) irrelevant?

1 more reply

ithkuil5y ago

There is a conditional CopyObject though (x-amz-copy-source-if...)

Can cover some of the use cases

ryanworl5y ago

Can you explain how this is useful? It seems like the destination is the important thing here not the source.

1 more reply

valenterry5y ago

Here's what I take away from this post:

> We built automation that can respond rapidly to load concentration and individual server failure. Because the consistency witness tracks minimal state and only in-memory, we are able to replace them quickly without waiting for lengthy state transfers.

So this means that the "system" that contains the witness(es) is a single point of truth and failure (otherwise we would lose consistency again), but because it does not have to store a lot of information, it can be kept in-memory and can be exchanged quickly in case of failure.

Or in other words: minimize the amount of information that is strictly necessary to keep a system consistent and then make that part its own in-memory and quickly failover-able system which is then the bar for the HA component.

Is that what they did?

mgdev5y ago

They've basically bolted on causal consistency.

It's a great change.

valenterry5y ago

Thank you for dropping the "causal consistency" term. I read the wikipedia article.

So, causal consistency in this context means that 1) it does not matter if a write to object A or object B came first, because they are seen as "unrelated". This obviously allows for performance improvements over general "strong consistency" but still offers more guarantees than eventual consistency.

Second, for every "writer" (which would be a 1:1 relationship to the number of S3 objects) an amount of metadata needs to be kept in-memory for such cases where the access to an object _might_ lead to an outdated read or where an "older" write would potentially overwrite a "newer" write.

All that being said, there still is one single instance that, if it goes down, makes the whole system unavailable for the S3 objects it manages until it is replaced. So that means a lower availability compared to a solution that uses eventual consistency (like Cassandra).

Would this be equivalent to having an in-memory SQL database to store the metadata for some other system (such Cassandra) with a quick failover but still single point of failure to enhance the consistency guarantees - just more optimized/customized so that it can work with a huge system like S3?

iblaine5y ago

Anyone else still seeing consistency problems w/S3 & EMR? The latest AWS re:Invent made it sound like this would be fixed but as of yesterday I was still using emrfs to correct S3 consistency problems.

8note5y ago

Yeah, yesterday as well

wolf550e5y ago

AWS fixed S3 consistency in December 2020:

https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-s3...

swyx5y ago

thats... what the post is about..

deepsun5y ago

By the way, Google's GCS had it from the beginning.

BrandonY5y ago

Hi, GCS engineer here. GCS offered a lot of consistency from the beginning, but we didn't have strong object listing consistency at the beginning. We got that somewhere around 2017 when we moved object metadata to Spanner. See https://cloud.google.com/blog/products/gcp/how-google-cloud-...

CommanderHux5y ago

Azure had it from the beginning

1 more reply

pawelmi5y ago

So it is both available and consistent (but perhaps only in read your own writes way?). What is then with resilence to network partitions, referring to CAP theorm? Did they build super reliable global network, so this is never a real issue?

somethingAlex5y ago

The consistency level seems to be Causal Consistency, which does include read-your-writes. S3 doesn't provide ACID transactions, so stricter consistency models aren't really needed.

From what I've read, if a network issue occurs which would impair consistency, S3 sacrifices availability. The write would just fail.

But this isn't your 5-node distributed system. Like they mention in the article, the witness system can remove and add nodes very quickly and it's highly redundant. A network issue that would actually cause split-brain or make it difficult to reach consensus would be few and far between.

juancampa5y ago

Can someone elaborate on this Witness system OP talks about?

I'm picturing a replicated, in-memory KV store where the value is some sort of version or timestamp representing the last time the object was modified. Cached reads can verify they are fresh by checking against this version/timestamp, which is acceptable because it's a network+RAM read. Is this somewhat accurate?

skynet-90005y ago

I'm picturing the same, but my guess is that it's using a time-synced serializability graph or MVCC in some way.

However, even a "basic" distributed lock system (like a consistently-hashed in-memory DB, sharded across reliable servers) might provide both the scale and single source of truth that's needed. The difficulty arises when one of those servers has a hiccup.

It'd be a delicious irony if it was based on hardware like an old-school mainframe or something like that.

crashocaster5y ago

I would have been interested to hear more about the verification techniques and tools they used for this project.

jeffbarr5y ago

Check out https://cacm.acm.org/magazines/2015/4/184701-how-amazon-web-... ("How Amazon Web Services Uses Formal Methods") and https://d1.awsstatic.com/Security/pdfs/One_Click_Formal_Meth... ("One-Click Formal Methods") for more info.

sidereal5y ago

The folks involved gave a neat talk about the verification techniques and tools they used as part of AWS Pi Week recently: https://www.twitch.tv/videos/951537246?t=1h10m10s

MeteorMarc5y ago

And for those who use minio server, the self hosted s3 storage, that has strong consistency, too.

vergessenmir5y ago

I have wondered if anyone is using minio at really large scale, or if there any examples of production use at its limits?

nhoughto5y ago

Would love a dive (hopefully deep) into IAM, the innards of that must be some impressive wizardry. Surprising there isn't more around about the amazing technical workings of these foundational AWS products.

killingtime745y ago

There are if you work for them....

whydoineedthis5y ago

I'm confused...did you fix the caching issue in S3 or not?

The article seems to explain why there is a caching issue, and that's understandable, but it also reads as if you wanted to fix it. I would think the headliner and bold font if it was actually fixed.

For those curious, the problem is that S3 is "eventually consistent", which is normally not a problem. But consider a scenario where you store a config file on S3, update that config file, and redeploy your app. The way things are today you can (and yes, sometimes do) get a cached version. So now there would be uncertainty of what was actually released. Even worse, some of your redeployed apps could get the new config and others the old config.

Personally, I would be happy if there was simply an extra fee for cache-busting the S3 objects on demand. That would prevent folks from abusing it but also give the option when needed.

jeffbarr5y ago

Yes, see my December 2020 post at https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-rea... :

"Effective immediately, all S3 GET, PUT, and LIST operations, as well as operations that change object tags, ACLs, or metadata, are now strongly consistent. What you write is what you will read, and the results of a LIST will be an accurate reflection of what’s in the bucket. This applies to all existing and new S3 objects, works in all regions, and is available to you at no extra charge! There’s no impact on performance, you can update an object hundreds of times per second if you’d like, and there are no global dependencies."

nindalf5y ago

Thanks for the link, it made the change being talked about clearer. However, I still don't understand how it was achieved. The explanation in the link appears truncated - lots of talk about the problem, then something about a cache and that's it. Is there an alternate link that talks about the mechanics of the change?

nelsonenzo5y ago

Oh, awesome, I missed that!

cldellow5y ago

It was fixed in December of 2020. Announcement blog post: https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-rea...

jasonpeacock5y ago

This is a general problem in all distributed systems, not just when pulling configuration from S3.

Let's assume you had strong consistency in S3. If your app is distributed (tens, hundreds, or thousands of instances running) then all instances are not going to update at the same time, atomically.

You still need to design flexibility into your app to handle the case where they are not all running the same config (or software) version at the same time.

Thus, once you've built a distributed system that is able to handle a phased rollout of software/config versions (and rollback), then having cache inconsistency in S3 is no big deal.

If you really need atomic updates across a distributed system then you're looking at more expensive solutions, like DynamoDB (which does offer consistent reads), or other distributed caches.

mgdev5y ago

The deeper in your stack you fix the consistency problem, the simpler the rest of your system needs to be. If you use S3 as a canonical store for some use case, that's pretty deep in the stack.

> Thus, once you've built a distributed system that is able to handle a phased rollout of software/config versions (and rollback), then having cache inconsistency in S3 is no big deal.

But this would also mean you can't use S3 as your source of truth for config, which is precisely what a lot of people want to do.

nelsonenzo5y ago

What I need is that when I make a call to a service, it gives back consistent results. Ergo, when the app does do a rolling deploy, it will get the right config on startup, not some random version.

It looks like it does exactly that now, it just wasn't clear from the article.

JDTech1235y ago

In that example, do you not see using S3 for that purpose as trying to use the wrong tool for the task at hand. Using AWS SSM parameter store [0] seems to me that it would be a tool designed to fit that purpose nicely.

[0] https://docs.aws.amazon.com/systems-manager/latest/userguide...

nelsonenzo5y ago

Complex config files suck in paramstore. Also, I've used this for mobile app configs that are pulled from s3, so paramstore wouldn't be an option.

tyingq5y ago

It is supposedly fixed.

"After a successful write of a new object, or an overwrite or delete of an existing object, any subsequent read request immediately receives the latest version of the object."

https://aws.amazon.com/s3/consistency/

j / k navigate · click thread line to collapse

49 comments

swyx5y ago

killingtime745y ago

The higher you go the more your job is marketing

rossmohax5y ago

Recent S3 consistency improvements are welcome, but S3 still falls behind Google GCS until they support conditional PUTs.

GCS allows object to be replaced conditionally with `x-goog-if-generation-match` header, which sometimes can be quite useful.

ignoramous5y ago

Vogels spoke briefly about why AWS prefers versioned objects instead here: https://queue.acm.org/detail.cfm?id=3434573

BTW, DynamoDB supports conditional PUTs if your data can fit under 400 KiB.

CodesInChaos5y ago

How do versioned objects make conditional puts unnecessary? I see little relation between them, except that you could use the version identifier in the condition.

1 more reply

skynet-90005y ago

Why would AWS provide a feature that makes additional transaction and storage charges (as well as subsequent reads to see which is the correct version) irrelevant?

1 more reply

ithkuil5y ago

There is a conditional CopyObject though (x-amz-copy-source-if...)

Can cover some of the use cases

ryanworl5y ago

Can you explain how this is useful? It seems like the destination is the important thing here not the source.

1 more reply

valenterry5y ago

Here's what I take away from this post:

Is that what they did?

mgdev5y ago

They've basically bolted on causal consistency.

It's a great change.

valenterry5y ago

Thank you for dropping the "causal consistency" term. I read the wikipedia article.

iblaine5y ago

8note5y ago

Yeah, yesterday as well

wolf550e5y ago

AWS fixed S3 consistency in December 2020:

https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-s3...

swyx5y ago

thats... what the post is about..

deepsun5y ago

By the way, Google's GCS had it from the beginning.

BrandonY5y ago

CommanderHux5y ago

Azure had it from the beginning

1 more reply

pawelmi5y ago

somethingAlex5y ago

The consistency level seems to be Causal Consistency, which does include read-your-writes. S3 doesn't provide ACID transactions, so stricter consistency models aren't really needed.

From what I've read, if a network issue occurs which would impair consistency, S3 sacrifices availability. The write would just fail.

juancampa5y ago

Can someone elaborate on this Witness system OP talks about?

skynet-90005y ago

I'm picturing the same, but my guess is that it's using a time-synced serializability graph or MVCC in some way.

It'd be a delicious irony if it was based on hardware like an old-school mainframe or something like that.

crashocaster5y ago

I would have been interested to hear more about the verification techniques and tools they used for this project.

jeffbarr5y ago

sidereal5y ago

The folks involved gave a neat talk about the verification techniques and tools they used as part of AWS Pi Week recently: https://www.twitch.tv/videos/951537246?t=1h10m10s

MeteorMarc5y ago

And for those who use minio server, the self hosted s3 storage, that has strong consistency, too.

vergessenmir5y ago

I have wondered if anyone is using minio at really large scale, or if there any examples of production use at its limits?

nhoughto5y ago

killingtime745y ago

There are if you work for them....

whydoineedthis5y ago

I'm confused...did you fix the caching issue in S3 or not?

The article seems to explain why there is a caching issue, and that's understandable, but it also reads as if you wanted to fix it. I would think the headliner and bold font if it was actually fixed.

Personally, I would be happy if there was simply an extra fee for cache-busting the S3 objects on demand. That would prevent folks from abusing it but also give the option when needed.

jeffbarr5y ago

Yes, see my December 2020 post at https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-rea... :

nindalf5y ago

nelsonenzo5y ago

Oh, awesome, I missed that!

cldellow5y ago

It was fixed in December of 2020. Announcement blog post: https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-rea...

jasonpeacock5y ago

This is a general problem in all distributed systems, not just when pulling configuration from S3.

Let's assume you had strong consistency in S3. If your app is distributed (tens, hundreds, or thousands of instances running) then all instances are not going to update at the same time, atomically.

You still need to design flexibility into your app to handle the case where they are not all running the same config (or software) version at the same time.

Thus, once you've built a distributed system that is able to handle a phased rollout of software/config versions (and rollback), then having cache inconsistency in S3 is no big deal.

If you really need atomic updates across a distributed system then you're looking at more expensive solutions, like DynamoDB (which does offer consistent reads), or other distributed caches.

mgdev5y ago

The deeper in your stack you fix the consistency problem, the simpler the rest of your system needs to be. If you use S3 as a canonical store for some use case, that's pretty deep in the stack.

> Thus, once you've built a distributed system that is able to handle a phased rollout of software/config versions (and rollback), then having cache inconsistency in S3 is no big deal.

But this would also mean you can't use S3 as your source of truth for config, which is precisely what a lot of people want to do.

nelsonenzo5y ago

What I need is that when I make a call to a service, it gives back consistent results. Ergo, when the app does do a rolling deploy, it will get the right config on startup, not some random version.

It looks like it does exactly that now, it just wasn't clear from the article.

JDTech1235y ago

[0] https://docs.aws.amazon.com/systems-manager/latest/userguide...

nelsonenzo5y ago

Complex config files suck in paramstore. Also, I've used this for mobile app configs that are pulled from s3, so paramstore wouldn't be an option.

tyingq5y ago

It is supposedly fixed.

"After a successful write of a new object, or an overwrite or delete of an existing object, any subsequent read request immediately receives the latest version of the object."

https://aws.amazon.com/s3/consistency/

j / k navigate · click thread line to collapse