Or more generally, when is it better to choose S2 vs services like SQS or Kinesis?
S2 sounds like an ordered queue to me, but those exist?
Another factor is how many ordered streams you can have. Typically a few thousand at most with those systems. We take the serverless spirit of S3 here, when did you have to worry about the number of objects in a bucket?
We are also able to offer latency comparable to disk-based streaming like Confluent's Kora and Kinesis, with our Express storage class (under 50 milliseconds end-to-end latency for client in the same cloud region) - while being backed by S3 with regional durability! Not a disk in the system.
We want people to be able to build safe distributed data systems on top of S2, so we also allow concurrency control mechanisms on the stream like fencing. Kafka or Kinesis won't let you do that. This is the approach AWS takes internally (https://brooker.co.za/blog/2024/04/25/memorydb.html), but they don't have that as a service. We want to democratize the pattern.
ED: on throughtputs, to clarify, I am talking about ordered throughput, i.e. per Kafka partition or Kinesis shard. WarpStream also does well here because of their architectural approach to separate ordering, but at a latency cost.
I’m sure that in this early preview you’re trying to reach mainly devs with existing domain expertise, but the way that, in this comment, you laid out existing constraints and what possibilities might lie beyond them—it really helped me situate your S2 product in the constellation of cloud primitives.
Just wanted to offer that feedback in the hope that the spirit of your comment here doesn’t get buried down-thread!
Looks like you're pushing for the throughput angle - that could be important but IMO it's not often you come across devs who need this level of throughput without dealing with large scale problem. My feedback is the lack of per-tenant encryption is a big deal breaker here since you're mixing up data of tenants within one objects.
Plus your security section talks very little how you prevent cross data contamination - that's probably first thing that popped up in my mind when I read about your data model. It makes me extremely uneasy - and can't imagine that I can adopt this for anything serious. I would encourage you to think about how you can communicate that angle to the customer as well, besides supporting per tenant encryption key.
Simple API, reasonable pricing, latency flexibility, unlimited streams, _and_ elastic to high throughputs. All adding up to a great serverless experience.
Re: the data colocation. This is how most multi-tenant systems - including S3 itself AFAIU - operate. I understand there is a difference in level of trust vs a cloud provider, and the best we can do here while delivering a serverless experience is encrypting every single record at the edge of S2 where they transit in or out, with a tenant-specific key. We may even allow specifying it as part of the request, if clients want to manage the key for themself.
The best data security when leveraging any multi-tenant service is going to be client-side encryption, and we also want to make this super easy. With our planned Kafka layer, we plan on client-side encryption as a value add.
ED: for some reason I wasn't seeing the reply link before on your comment, do see it now.