This makes it possible to easily update your event processing logic or add new components to the mix while maintaining a clean architecture.
I gave a talk about this at PostgresConf US a few weeks ago, the talk's not yet available as an article, but you can get the slides from https://aiven.io/blog/aiven-talks-in-postgresconf-us-2018/ if you're interested.
The concept of "stream-table duality" tells us tables can be thought of as a materialized view of change data operations [INSERT/UPDATE/DELETE] streams. Kafka can be used as a buffer for streaming data that can materialized into a relational table at any time.
One of the more interesting use-cases is multi-target replication: feed your change-data-capture data into Kafka, and replay it on any other backend data store (SQL, Graph Database, NoSQL, etc.)[1]
Conceptually this lets you ingest data into a stream, write the data on multiple backends and keep everything in sync. Martin Klepmann has written a simple (PoC-quality) tool for doing this with Postgres databases.
[1] https://www.confluent.io/blog/bottled-water-real-time-integr...
Consumers read from the topic (the underlying partitions) and maintain the offset last read themselves, allowing for easy replay, strict ordering within a single partition, and completely disconnected pacing from producers. The optional "key" for each message allows for compaction so only the latest key/value pair is kept in a topic.
It's definitely not a database but works well for replicating them, as well as doing most of the work of a typical message queue / service bus system.
We have been hearing more and more about people using Kafka to support streaming analytics. I haven't spent a ton of time studying it, I was under the impression that it was just a giant queue, but ordering + retention is basically a database in the way I think about it.
Also consumers can be in a "consumer group" so you can have multiple clusters of consumers each reading the entire log separately, but shared within each cluster.
Also I'd recommend looking at Apache Pulsar for a next-generation architecture that combines Kafka's log semantics with the low-latency routing and individual message acknowledgements of message queues: https://pulsar.incubator.apache.org/
If Kafka supported data processing (queries or whatever) then it would be much closer to databases. Also, databases are normally aware of the structure of the data (for example, columns).
Therefore, Kafka can hardly be viewed as a DBMS because it explicitly separates two major concerns:
* data management - how to represent data (Kafka)
* data processing - how to derive/infer new data (Kafka Streams - a separate library)
Theoretically, if they could combine these two layers of functionality in one system then it would be a database.
https://engineering.linkedin.com/distributed-systems/log-wha...