>Update June 14th, 2014, 10:20 AM PDT: Stability on the DreamObjects cluster has been restored. Requests appear to be resolving properly now that the system has had time to re-balance itself. Our test are reporting properly now. If you do have any questions or concerns, please contact support contact support.
I'm quite interested to know what went on in particular, as I'm far more interested in Ceph than in commercial object stores that I can't extend. Librados is pretty damn awesome too, and I can foresee implementing some highly distributed storage through that directly.
With DreamObjects, it sounds like some API servers went down, and failure happened such that it couldn't serve some requests until the appropriate nodes came back.
It appears that with Ceph it will be easy to keep enough replicas such that data is not lost, but high availability is still being hashed out. Hopefully the lessons from this failure guarantee that this particular failure mode doesn't happen again.
Ceph came from Dreamhost, so it's very likely that they're the best subject matter experts outside of Inktank.
You can deduce quite a bit about S3's system architecture from the kind of time and consistency guarantees it provides, the optimal and pessimal cases for bucket key naming, etc. Riak CS hits pretty much all those same notes, so I would hazard that you can get a good high-level sense of S3's architecture by looking at Riak's.
For people who are actually on a budget for their projects, which is more than you might think, if the reliability is OK, this might make a lot of sense over paying much more to use the real S3 for backups.
Disclaimer: I'm on the GlusterFS team at Red Hat, which puts me pretty close to the Ceph team but not actually one of them. I've never had anything to do with Dreamhost.