Additionally, I would really, really like to be able to use it as an Event Store, easily accessible by anyone in the org with infinite data retention. I know Kafka kind-of sort-of provides this functionality, but it doesn’t work in practice.
This appears to be a solution to this problem. Will be interesting to see whether it gains traction.
How so?
An simple way to get around this problem is dumping messages into a file and putting that file in S3 named something like “topic-partition-offset” where offset is the offset of the first message contained within that file. You can then read those forward starting from offset zero and go until you reach the end, then start reading from Kafka for recent data.
The drawback is this isn’t integrated with Kafka so you’re now maintaining what is effectively two different systems for the same data. It also means the key-based compaction won’t work either and you’d have to re-implement that on top of the files in S3 as well.
Storing all data forever in a single source of truth is awesome until regulation like GDPR comes along. Do you have plans to support excision or is your guidance on personal data to avoid putting it into a system like Kafka/Pyrostore?
It covers several strategies, three of which are:
* Encrypt it and then throw away the key to forget it
* Store private data outside the event with the event just pointing to it
* Delete events (on systems that support this)
Workarounds for excision in Kafka, such as key compaction, are often not possible to use as they depend on the key scheme used.
https://azure.microsoft.com/en-us/services/hdinsight/apache-...
Is this same approach as pyro ?
This reduces operational complexity significantly vs scaling nodes up, dealing with rebalancing, under replicated partitions, etc.
What does ' lossy effects on stream-ability. ' mean here. Stream slows down, data loss or something else?
This would be even better if it didn't need a modified client.