undefined | Better HN

0 pointsryanworl8y ago0 comments

One potential problem is a Kafka partition’s size is limited to the size of the smallest machine in the replica set. This means if you want infinite retention you have to potentially over-partition so they never get too big, keep buying bigger machines and disks, or deal with a repartition of all data.

An simple way to get around this problem is dumping messages into a file and putting that file in S3 named something like “topic-partition-offset” where offset is the offset of the first message contained within that file. You can then read those forward starting from offset zero and go until you reach the end, then start reading from Kafka for recent data.

The drawback is this isn’t integrated with Kafka so you’re now maintaining what is effectively two different systems for the same data. It also means the key-based compaction won’t work either and you’d have to re-implement that on top of the files in S3 as well.

0 comments

beepbeepbeep18y ago

I've used LVM in AWS to present a single volume to Kafka >16TB which is their max EBS size. They report you can attach up to 40 EBS volumes.

Growing LVM with XFSs has worked well, 0 downtime and around 60 seconds.

Allows you to over provision just enough you do not have to babysit the drives or pay $$$ for unused disc.

If you stripe the volumes you'll also distribute your IOPs in AWS.

Outside AWS LVM still applies. Kafka's JBOD is useless without easy / auto rebalancing.

This week onsite at a client's I discovered ScaleIO which can present up to a 1PB volume and does clever sharding/replication in the background.

linkmotif8y ago

> or deal with a repartition of all data.

Why is this difficult? Is mirroring clusters operationally problematic? If one cluster gets too small, in theory can’t you spin up another cluster, and mirror the first onto the second. Then when they are in sync direct writes to the new cluster?

ryanworlOP8y ago

That sounds possible, but it would both involve downtime or potentially ordering and data duplication issues if you mess it up. Dynamic expansion and contraction of partition count should be a feature that doesn’t require recreating the entire cluster like essentially every other data product in the world.

gazarsgo8y ago

There's no downtime in mirroring a cluster with MirrorMaker, and it's become common to isolate producers from consumers to separate clusters, see https://medium.com/netflix-techblog/kafka-inside-keystone-pi...

Ordering of messages is completely unaffected, it's the routing of future events that's affected when you increase partition count. This is critical for some use cases (windowing of data for analytics purposes, for example) but irrelevant to others.

Data duplication issues? This sounds like FUD also, but common guidance is to design your events to be idempotent, or utilize Kafka's new exactly-once delivery.

You can expand partition counts for a topic dynamically.

You can't currently decrease partition counts, because given the current design that could orphan both data and consumers.

jbs408y ago

The architectural pendulum is starting to swing away from co-location of storage and compute (the trend of the last 10+ years) to decoupling of storage and processing to avoid exactly these issues, but legacy architectures hang on for a while.

In the streaming and messaging space, Apache Pulsar (pulsar.apache.org) is a more recent solution that has an architecture that decouples processing and storage. That gives you nice properties like independent scaling of storage and processing, infinite data retention, dynamic resizing and others.

1 more reply

beepbeepbeep18y ago

You can but even in TB's of data never mind PB's it can take weeks to sync across.

Also running two clusters to handle large volumes of data, it's big money. Even a small modest cluster with around 20TB+ of data was north of 30K a month on AWS. That's a full app cluster though with consumers/producers aswell as brokers.

j / k navigate · click thread line to collapse

0 comments

beepbeepbeep18y ago

I've used LVM in AWS to present a single volume to Kafka >16TB which is their max EBS size. They report you can attach up to 40 EBS volumes.

Growing LVM with XFSs has worked well, 0 downtime and around 60 seconds.

Allows you to over provision just enough you do not have to babysit the drives or pay $$$ for unused disc.

If you stripe the volumes you'll also distribute your IOPs in AWS.

Outside AWS LVM still applies. Kafka's JBOD is useless without easy / auto rebalancing.

This week onsite at a client's I discovered ScaleIO which can present up to a 1PB volume and does clever sharding/replication in the background.

linkmotif8y ago

> or deal with a repartition of all data.

ryanworlOP8y ago

gazarsgo8y ago

Data duplication issues? This sounds like FUD also, but common guidance is to design your events to be idempotent, or utilize Kafka's new exactly-once delivery.

You can expand partition counts for a topic dynamically.

You can't currently decrease partition counts, because given the current design that could orphan both data and consumers.

jbs408y ago

1 more reply

beepbeepbeep18y ago

You can but even in TB's of data never mind PB's it can take weeks to sync across.

j / k navigate · click thread line to collapse