undefined | Better HN

0 pointsstogot1y ago0 comments

Could you clarify the Kafka difference further?

Or more generally, when is it better to choose S2 vs services like SQS or Kinesis?

S2 sounds like an ordered queue to me, but those exist?

0 comments

(Founder here) Managed cloud offerings for streaming limit ordered throughput pretty low, e.g. Kinesis at 1 MiBps, Redpanda serverless at 1 MiBps, Confluent's even higher-end clusters at 10-20 MiBps IIRC. If you really need ordering, this can indeed be a limit. S2 lets you push 125 MiBps currently, and we may grow that.

Another factor is how many ordered streams you can have. Typically a few thousand at most with those systems. We take the serverless spirit of S3 here, when did you have to worry about the number of objects in a bucket?

We are also able to offer latency comparable to disk-based streaming like Confluent's Kora and Kinesis, with our Express storage class (under 50 milliseconds end-to-end latency for client in the same cloud region) - while being backed by S3 with regional durability! Not a disk in the system.

We want people to be able to build safe distributed data systems on top of S2, so we also allow concurrency control mechanisms on the stream like fencing. Kafka or Kinesis won't let you do that. This is the approach AWS takes internally (https://brooker.co.za/blog/2024/04/25/memorydb.html), but they don't have that as a service. We want to democratize the pattern.

ED: on throughtputs, to clarify, I am talking about ordered throughput, i.e. per Kafka partition or Kinesis shard. WarpStream also does well here because of their architectural approach to separate ordering, but at a latency cost.

alwa1y ago

Between your site copy and your early comments on this thread, it was this rundown that made the product click in my mind.

I’m sure that in this early preview you’re trying to reach mainly devs with existing domain expertise, but the way that, in this comment, you laid out existing constraints and what possibilities might lie beyond them—it really helped me situate your S2 product in the constellation of cloud primitives.

Just wanted to offer that feedback in the hope that the spirit of your comment here doesn’t get buried down-thread!

shikhar1y ago

thank you for the feedback!

yandie1y ago

Hey congrats! Looks like a really cool idea.

Looks like you're pushing for the throughput angle - that could be important but IMO it's not often you come across devs who need this level of throughput without dealing with large scale problem. My feedback is the lack of per-tenant encryption is a big deal breaker here since you're mixing up data of tenants within one objects.

Plus your security section talks very little how you prevent cross data contamination - that's probably first thing that popped up in my mind when I read about your data model. It makes me extremely uneasy - and can't imagine that I can adopt this for anything serious. I would encourage you to think about how you can communicate that angle to the customer as well, besides supporting per tenant encryption key.

shikhar1y ago

(Founder) It's a number of dimensions. I get excited about the ordered throughput angle because I have personally cared about this in the past, and yeah a lot of folks may not need that :)

Simple API, reasonable pricing, latency flexibility, unlimited streams, _and_ elastic to high throughputs. All adding up to a great serverless experience.

Re: the data colocation. This is how most multi-tenant systems - including S3 itself AFAIU - operate. I understand there is a difference in level of trust vs a cloud provider, and the best we can do here while delivering a serverless experience is encrypting every single record at the edge of S2 where they transit in or out, with a tenant-specific key. We may even allow specifying it as part of the request, if clients want to manage the key for themself.

The best data security when leveraging any multi-tenant service is going to be client-side encryption, and we also want to make this super easy. With our planned Kafka layer, we plan on client-side encryption as a value add.

1 more reply

shikhar1y ago

@agallego Yes in aggregate both Confluent and Redpanda can push GiBps throughputs, and I know Redpanda has amazing perf. I was referring to Redpanda Serverless :) And per-partition i.e. ordered throughput.

ED: for some reason I wasn't seeing the reply link before on your comment, do see it now.

agallego1y ago

coo cool right on.

agallego1y ago

Redpanda cloud doesn’t limit tput. Most ppl get a bigger discount at high volumes. We have customers in 10s of GB/s. Confluent has those volumes too.

j / k navigate · click thread line to collapse

0 comments

shikhar1y ago

alwa1y ago

Between your site copy and your early comments on this thread, it was this rundown that made the product click in my mind.

Just wanted to offer that feedback in the hope that the spirit of your comment here doesn’t get buried down-thread!

shikhar1y ago

thank you for the feedback!

yandie1y ago

Hey congrats! Looks like a really cool idea.

shikhar1y ago

(Founder) It's a number of dimensions. I get excited about the ordered throughput angle because I have personally cared about this in the past, and yeah a lot of folks may not need that :)

Simple API, reasonable pricing, latency flexibility, unlimited streams, _and_ elastic to high throughputs. All adding up to a great serverless experience.

1 more reply

shikhar1y ago

ED: for some reason I wasn't seeing the reply link before on your comment, do see it now.

agallego1y ago

coo cool right on.

agallego1y ago

Redpanda cloud doesn’t limit tput. Most ppl get a bigger discount at high volumes. We have customers in 10s of GB/s. Confluent has those volumes too.

j / k navigate · click thread line to collapse