Put another way, I think Kafka solved a problem for enterprises and was a tops down approach to the problem. Redis streams is a bottoms up approach to implementing evented architectures. Maybe there's room for both products in the market?
For instance just imagine that I never touched a Kafka system in my life, never used it, don't even know the API, I only read all the documentation they have on the site about the design, to get the higher level picture and combine this with my own ideas about fixing the fact Redis was lacking a "log" data structure.
Pub/Sub + other data structures were not able to provide time series and streaming, but yet Redis streams remain an ADT (Abstract Data Structure), while Kafka is a tool to solve a very specific business case. So the applications have some intersection, but are also very very apart.
For instance you can create infinite small keys having streams, so Redis is good for many IoT usages where you receive data from many small devices. Redis Streams also stress on range queries, so that you can combine the time series with millisecond-range queries.
However, yes, the fact that I added also consumer groups is a way to put this "80% streaming" into a more usable streaming systems more similar to Kafka, for the use cases where:
1) The memory limits.
2) The speed.
3) The consistency guarantees of Redis make sense.
However at the same time, it was a great challenge and pleasure to do what I always try to do, that is to create an API for developers thinking like I'm designing an iPhone, and not some terrible engineering thing which does what it should but is terrible to use (I'm not referring to Kafka that I do not know). So I really hope that what you say "easy to setup and use" will be what developers will feel :-)
You need to get yourself some whitespace in this reply!
Albeit my request is quite selfish, I really would love to hear/read more about your thoughts on designing APIs. My experience using Redis has been excellent and I'd to be able to replicate that sense of design in the systems I build as well.
Really pleased to see that I'm not the only one digging for info and that work is ongoing.
Perhaps a resurrection of the Redis Watch newsletter is in order! Are there any existing alternatives?
4.0 (done) ->
Streams back ported to 4.0 (Work in progress) ->
4.2 (or 5.0) with Disque + Cluster improvements + Modules improvements, ...
It's just a renaming:
4.0 + Streams backported is now called 5.0
What was to be 4.2 is going to be called 6.0
Why I'm choosing to go for integer numbers? Because I believe that things like 4.2 should be for minor improvements, mostly operational, but to add the first data structure after ages deserves 5.0, similarly to have a reshaped Redis Cluster + Disque deserves 6.0, and I get myself confused as user of other systems when they advance like 1.4, 2.3, 2.7, ... It's simpler to talk about Redis 4, Redis 5, Redis 6, ...
I actually implemented yet another job framework[0] for fun in Python with Redis and it was a pleasure. Lua, pub/sub and atomic operations really go a long way!
I kind of want the opposite of webhooks.
Sometimes APIs will give you tokens to use for resumption (e.g. SSE event IDs, or any long-polling API), but typically these are for a time-limited session rather than a stateless query against any point in a long-lived log.
Years ago, services like Friendfeed, Livefyre, and Convore had stateless long-polling APIs that returned a log of data, I believe. These kinds of APIs seem to have fallen out of fashion, though. There are still stateless long-polling APIs, but most of the ones I'm aware of don't return logs of data. For example, Dropbox and Box will let you query for a change notification against a starting position, but then you have to fetch the actual data separately.
That said, just because streaming APIs that let you set a starting position are rare doesn't mean they're impossible to make. My company (https://fanout.io) has built tools to help with this.
Edit: since you asked for a real example, Superfeedr is one such API: https://documentation.superfeedr.com/subscribers.html#stream...
If you absolutely need Kafka then it's still a good option, although I'd recommend looking at Apache Pulsar [1] for a better design. It separates storage and compute for better performance and scalability while giving you features like per-message acknowledgements.
redis-cli>XADD AAPL 1516899637000.0 open 221.25 high 222.1 low 220.90 close 221.50 volume 1121223234234
(All data is imaginary.)How is this going to be different from Kafka? And I don't mean implementation details, because these are always fun read. Kafka is on the market for ~7 years, during which it has proven to be oh-so-fast and pretty durable.
Oh, and while I'm at it. Here's another problem Redis geniousl added: a GIL. GIL is a great idea, but comes with huge tradeoffs. David Beazley spent years showing how many tricks you can play upon yourself with GIL.
So ... now you have Streams and GIL together. And you already have dicts (you call them hashmaps). I have a feeling you're trying to implement Python. If so, it's done. But come on, 3.6 is cool. And we're kinda solving the GIL problem. With PyPy. Which will blow your mind.
Honest question: Why do you hate Redis?
Alas, I have been more than once bitten by poorly implemented GIL and in my personal experience any GIL is problematic in a high-throughput environment. It simplifies the problem of implementation of environment or API for the price of introducing global lock into the system, thus eliminating any possibility of lockless design. And we know it's doable.
I spent years trying to work around GIL in production environments and we have the tools about now to do it reliably. We have on_commit in Django, we have Celery with proper support for mechanisms like chords and a myriad of others.
I do get that Redis has a different approach to many things. But here's the deal - every time I had to rely on Redis as a critical component, there was a problem. Either there was that "one gotcha" in the config or I didn't understood something or else.
A while ago I was a part of discussion with the author of Redis on using Redis as cache. And quite some people, not only me, noted they had bad experience and their benchmarks didn't show Redis is better than Memcached. And, I think that's ok. Unless you pose your product as a competitor to memcached. Would I use memcached-db? No, it's a bad idea.
My problem is that Redis originally was a key value store with datastructures. And that's a great idea and not that many services let you model datastructures in a distributed way. I actually implemented a distributed system of progress control for a very critical piece of architecture at my current workplace. I'm a Redis user since 2014 in regard to commercial products.
My point of view is probably largely skewed but I don't understand and I never seen that explained why Redis tries to do everything. We have battle tested Kafka for streams, ElasticSearch and Solr for searching and Memcached for caching. Can't we get a super reliable data store that supports data structures? A data store that is _able_ to utilize the fact we have more than one core?
I know my view is unpopular, but I don't see addition of GIL as a genius idea. I simply don't see a coherent direction for Redis development.
BTW: is there a benchmark comparing latest Redis without the GIL and the latest with GIL?
I came crawling back. Why? Because people at Mozilla spent their time working on their product, finally coming to agreement that the fact your product is OS doesn't mean you can cut your standards in half focusing on features you like instead the ones that actually make it better.
Here's the link in case someone missed it https://hacks.mozilla.org/2017/11/entering-the-quantum-era-h.... I'm not a Mozilla fanboy, to be frank for a while foxy browser scored lower on my "software like scale" than Redis.
And they won me over. In 5 minutes. And it's been proven time and time again that performance is a feature.
p.s. Now that I write it I realize that for certain reasons I should just try to write the damn thing and instead of writing pointy comments let antirez rip my software apart :)