It's one thing that S3 keeps going down today; we run our own server cluster and I accept that 100% uptime isn't possible. But it's aggravating that they can't at least figure out how to give timely updates on their dashboard when something is broken.
We inevitably learn of S3 outages through our internal error reporting systems before AWS posts it to their status page. When they do finally post, it is usually a tiny "information" icon, even when reporting a problem that makes the service unusable. The laggy, misleading nature of their status page gives the impression they must be tying bonuses to the status icons. Can't fathom why else they would be so inept when it comes to keeping us updated when something is wrong. Surely they have sufficient internal monitoring to pick up on these outages long before they update their customers.
"Hello, We have just become aware of EC2 network connectivity issues in the US-EAST-1 region. The impact of this issue is loss of network connectivity to EC2 instances in US-EAST-1. The AWS support and engineering teams are actively working on bringing closure to this issue. I will share additional information as soon as I learn more about this issue."
Heh.
I feel like AWS has way too many moving parts to be stable.
It's very tempting for them to reuse bits of infrastructure everywhere which increases the chances that if something goes wrong somewhere it will break your stuff. So for example, hosting instance images on S3 means that when S3 has issues, now EC2 has issues.
As far as I remember, S3's US Standard region hasn't had a serious incident since Fall 2012. That's a pretty great uptime record in general, even though it's terribly frustrating on days like today.
My current company spends a few hundred dollars a month on S3. I certainly couldn't match S3's uptime at that budget. Maybe I could do it at 2-3X the budget, but with that budget it might be easier to mirror my files across multiple S3 regions.
Exactly. Build multi-region support into your app(s), enable S3's replication so objects in your primary region are replicated to another region, and then properly handle loss of a region (go read-only or write to another region and restore consistency later).
You'll still be spending less than attempting to maintain a highly durable object store yourself across multiple datacenters or geographic areas.
The reality is most customers are not affected, and overall service uptime is highest anywhere around.
Not to mention that whenever AWS is having issues it's always in one region at a time, and frequently a single availability zone. As long as you build your application to be AZ-tolerant, you won't run into problems.
Unfortunately it's really impossible to say in this case, since they don't release numbers. Informally everyone I know with S3 buckets in US-Default had issues this morning.
As long as you build your application to be AZ-tolerant, you won't run into problems.
What you say about multiple AZs is true for EC2, but many other AWS services (especially EBS-backed ones) tend to go down across the entire region. If you're serious about availability, you really need to be in multiple regions.
This is a total crock. On the 31st of July, our ec2 instances across 2 availability zones were shutdown without warning. I waited 3 hours not being able to do anything. So far I've only got info from first level support and it's been "escalated".
And the perfect "out" for AWS, every single time they have issues.
I haven't rented servers in about two years, but yes. My old MediaTemple dedicated servers and the pair we had colocated at a local telco's large datacenter experienced occasional network outages and whatnot just like AWS does.
IMO, AWS made load balancing and fault-tolerant setups much more accessible to small businesses. At the time of our switch, getting a load balancer at our colo was quite pricey whereas AWS charges $15ish a month for it. Getting a three-tier setup on AWS was easy, whereas at the colo we'd have to pay for at least 6U of space even for relatively small amounts of traffic.
It also highly depends on the service and solution you build. I know in traditional EC2 space, over the long term you absolute can self-host for far less money, but you are talking about 5-year TCO.
Things like S3 bring another advantage: a globally distributed data store is HARD to build and supply on your own dime. I would argue things like that are much harder to beat self-hosted.
Well, everyone has different definitions of stable.
> I'm risking being inflammatory here, but do people really believe that they get better uptime from AWS compared to renting dedicated servers?
I don't think it is inflammatory, but perhaps you have an idealized notion of how well teams manage uptime, and more importantly, failures, on average. Sure, a great team will do fantastic, but an average team will... not.
That said, it seems likely that AWS is hitting some kind of a rough patch right now. There are, however, other cloud services whose uptime records for the last year would be the envy of your typical company's internal hosting services, and I'd expect AWS to return to that fold shortly.
Nobody in this thread has made any comments about their uptime expectations before you brought up the subject. There's no sign here of the "people" you are referring to.
It's not inflammatory to ask a question that people are no doubt thinking, even if the answer is obvious.
For the above reasons, and that I work in the SF bay area, I put everything in us-west-2. us-west-2 sometimes has it's own issues, but nothing quite at the level of us-east-1.
http://shlomoswidler.com/2009/12/read-after-write-consistenc...
> Aha! I had forgotten about the way Amazon defines its S3 regions. US-Standard has servers on both the east and west coasts (remember, this is S3 not EC2) in the same logical “region”. The engineering challenges in providing read-after-write consistency in a smaller geographical area are greatly magnified when that area is expanded. The fundamental physical limitation is the speed of light, which takes at least 16 milliseconds to cross the US coast-to-coast (that’s in a vacuum – it takes at least four times as long over the internet due to the latency introduced by routers and switches along the way).
Our AWS TAM called us. I don't think he wanted the nasty call I gave him at 4:30am
Also, for the record, S3 has been very stable for us otherwise. We have been rather happy with AWS overall.
"com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 500, AWS Service: Amazon S3, AWS Request ID: -redacted-, AWS Error Code: InternalError, AWS Error Message: We encountered an internal error. Please try again., S3 Extended Request ID: -redacted-"
:/
As for what happened, my money is on this: https://aws.amazon.com/about-aws/whats-new/2015/08/amazon-s3...
> You can now increase your Amazon S3 bucket limit per AWS account... Amazon S3 now supports read-after-write consistency for new objects added to Amazon S3 in US Standard region.
The 100 bucket limit used to be an absolute, unchangeable hard limit - rare for AWS and thus likely something deep in the architecture from S3 being one of their first services - so I suspect the lifting of that limit involved some fairly major changes to the backend.
I'd wager it's more likely that read-after-write change.
edit: seeing connectivity issues again at 19h50UTC