"So is it crazy to do this? The answer is no, there’s nothing crazy about storing data in Kafka: it works well for this because it was designed to do it. Data in Kafka is persisted to disk, checksummed, and replicated for fault tolerance. Accumulating more stored data doesn’t make it slower. There are Kafka clusters running in production with over a petabyte of stored data."
[1] https://www.confluent.io/blog/okay-store-data-apache-kafka/
https://news.ycombinator.com/item?id=23206566
Apache Kafka also doesn’t refer to itself as a database in its own documentation:
> I think it makes more sense to think of your datacenter as a giant database, in that database Kafka is the commit log, and these various storage systems are kinds of derived indexes or views.
My company supports analytic systems and we see this pattern constantly. It's also sort of a Pat Helland view of the world that subsumes a large fraction of data management under a relatively simple idea. [1] What's interesting is that Pat also sees it as a way to avoid sticky coordination problems as well.
If you wanted to literally use Kafka for your commit log the same way the Amazon aurora are using a distributed commit log. You would find that a lot of feature a commit log need are missing and impossible to add to kafka.
That is a valid theory when we talk about readers which look at recent data or when you are trying to append data to the existing system.
But in practice, the accumulation of cold data on a local disk is where this starts to hurt, particularly if that has to serve read traffic which starts from the beginning of time (i.e your queries don't start with a timestamp range).
KSQL transforms does help reduce the depth of the traversal, by building flatter versions of the data set, but you need to repartition the same data on every lookup key you want - so if you had a video game log trace, you'd need multiple materializations for (user) , (user,game), (game) etc.
And on this local storage part, EBS is expensive to just hold cold data, but then replicate it to maintain availability during a node recovery - EBS is like a 1.5x redundant store, better than a single node. I liked the Druid segment model of shoving it off to S3 and still being to read off it (i.e not just stream to S3 as a dumping ground).
When Pravega came out, I liked it a lot for the same - but it hasn't gained enough traction.
We ran a cluster with lots of compacted topics, hundreds of terabytes of data. At the time it would make broker startup insanely slow. An unclean startup could literally take an hour to go through all the compacted partitions. It was awful.
At it’s core it’s electron state in hardware. So long as those limits are not incidentally exceeded, and you validate outputs, who really cares what gets loaded?
While we rip rare minerals from the ground and toss all that at scale every 3-5 years later we get economical over installing software.
So long as it offers the necessary order of operations to do the work, whatever.
The "event sourced" arch they sketched is missing pieces. Normaly you'd have single writer instances that are locked to the corresponding kafka partition, which ensure strong transactional guarantees, IF you need them.
Throwing shade for maketings sake is something that they should be above.
I mean c'mon, I'd argue that Postgres enhanced with Materialize isn't a database anymore either, but in a good sense!
It's building material. A hybrid between MQ, DB, backend logic & frontend logic.
The reduction in application logic and the increase in reliability you can get from reactive systems is insane.
SQL is declarative, reactive Materialize streams are declarative on a whole new level.
Once that tech makes it into other parts of computing like the frontend, development will be so much better, less code, less bug, a lot more fun.
Imagine that your react component could simply declare all the data it needs from a db, and the system will figure out all the caching and rerendering.
So yeah, they have awesome tech with many advantages, so I don't get why they bad-mouth other architectures.
> SQL is declarative, reactive Materialize streams are declarative on a whole new level.
Thank you for the kind words about our tech, I'm flattered! That said, this dream is downstream of Kafka. Most of our quibbles with the Kafka-as-database architecture are to do with the fact that that architecture neglects the work that needs to be done _upstream_ of Kafka.
That work is best done with an OLTP database. Funnily enough, neither of us are building OLTP databases, but this piece largely is a defense of OLTP databases (if you're curious, yes, I'd recommend CockroachDB), and their virtues at that head of the data pipeline.
Kafka has its place - and when its used downstream of CDC from said OLTP database (using, e.g. Debezium), we could not be happier with it (and we say so).
The best example is in foreign key checks. It is not good if you ever need to enforce foreign key checks (which translates to checking a denormalization of your source data _transactionally_ with deciding whether to admit or deny an event). This is something that you may not need in your data pipeline on day 1, but adding that in later is a trivial schema change with an OLTP database, and exceedingly difficult with a Kafka-based event sourced architecture.
> Normally you'd have single writer instances that are locked to the corresponding Kafka partition, which ensure strong transactional guarantees, IF you need them.
This still does not deal with the use-case of needing to add a foreign key check. You'd have to:
1. Log "intents to write" rather than writes themselves in Topic A 2. Have a separate denormalization computed and kept in a separate Topic B, which can be read from. This denormalization needs to be read until the intent propagates from Topic A. 3. Convert those intents into commits. 4. Deal with all the failure cases in a distributed system, e.g. cleaning up abandoned intents, etc.
If you use an OLTP database, and generate events into Kafka via CDC, you get the best of both worlds. And hopefully, yes, have a reactive declarative stack downstream of that as well!
People do do this. I have done this. I wish I had been more principled with the error paths. It got there _eventually_.
It was a lot of code and complexity to ship a feature which in retrospect could have been nearly trivial with a transactional database. I'd say months rather than days. I won't get those years of my life back.
The products were build on top of Kafka, Cassandra, and Elasticsearch where, over time, there was a desire to maintain some amount of referential integrity. The only reason we bought into this architecture at the time was horizontal scalability (not even multi-region). Kafka, sagas, 2PC at the "application layer" can work, but you're going to spend a heck of a lot on engineering.
It was this experience that drove me to Cockroach and I've been spreading the good word ever since.
> If you use an OLTP database, and generate events into Kafka via CDC, you get the best of both worlds.
This is the next chapter in the gospel of the distributed transaction.
100% agree this is the way to go instead of rolling your own transaction support you get the "ACID" for free from the DB and use KAFKA to archive changes and subscribe to them.
I think the general gist of "use an OLTP database as your write model if you don't absolutely know what you're doing" is completely sane advice, however I think there are far more nuanced (and in a sense also honest) arguments that can be made.
I think the architecture you've sketched is over engineered for what you'd need for the task. So here's what I'd build for your inventory example, IFF the inventory was managed for a company with a multi Terra Items Inventory that absolutely NEEDS horizontal scaling:
One event topic in Kafka, partitioned along the ID of the different stores whose inventory is to be managed. This makes the (arguably) strong assumption that inventory can never move between stores, but for e.g. a Grocery chain that's extremely realistic.
We have one writer per store ID partition, which generates the events and enforces _serialisability_, with a hot writer failover that keeps up do date and a STONITH mechanism connecting the two. All writing REST calls / GraphQL mutations for its store-ID range, go directly to that node.
The node serves all requests from memory, out of an immutable Data-structure, e.g. Immutable.js, Clojure Maps, Datascript, or an embedded DB that supports transactions and rollbacks, like SQLite.
Whenever a write occurs, the writer generates the appropriate events, applies them to its internal state, validates that all invariants are met, and then emits the events to Kafka. Kafka acknowledging the write is potentially much quicker than acknowledging an OLTP transaction, because Kafka only needs to get the events into the memory of 3+ machines, instead of written to disk on 1+ machine (I'm ignoring OLTP validation overhead here because our writer already did that). Also your default failure resistance is much higher than what most OLTP systems provide in their default configuration (e.g. equivalent to Postgres synchronous replication).
Note that the critical section doesn't actually have to be the whole "generate event -> apply -> validate -> commit to kafka" code. You can optimistically generate events, apply them, and then invalidate and retry all other attempts once one of them commits to Kafka. However that also introduces coordination overhead that might be better served mindlessly working off requests one by one.
Once the write has been acknowledged by Kafka, you swap the variable/global/atom with the new immutable state or commit the transaction, and continue with the next incoming request.
All the other (reading) request are handled by various views on the Kafka Topic (the one causing the inconsistencies in the article). They might be lagging behind a bit, but that's totally fine as all writing operations have to go through the invariant enforcing write model anyways. So they're allowed to be slow-ish, or have varying QOS in terms of freshness.
The advantage of this architecture is that you have few moving parts, but those are nicely decomplected as Rich Hickey would say, you have use the minimal state for writing which fits into memory and caches, you get 0 lock congestion on writes (no locks), you make the boundaries for transactions explicit and gain absolutely free reign on constraints within them, and you get serialisability for your events, which is super easy to reason about mentally. Plus you don't get performance penalties for recomputing table views for view models. (If you don't use change capture and Materialize already, which one should of course ;] )
The two generals problem dictates that you can't have more than a single writer in a distributed system for a single "transactional domain" anyways. All our consensus protocols are fundamentally leader election. The same is true for OLTP databases internally (threads, table/row locks e.t.c.), so if you can't handle the load on a single 0 overhead writer that just takes care of its small transactional domain, then your database will run into the exact same issues, probably earlier.
Another advantage of this that so far has gone unmentioned is that if allows you to provide global identifiers for the state in your partitions that can be communicated in side-effect-full interactions with the outside world. If your external service allows you to store a tiny bit of metadata with each effect-full API call then you can include the offset of the current event and thus state. That way you can subsume the external state and transactional domain into the transactional domain of your partition.
Now, I think that's a much more reasonable architecture, that at least doesn't have any of the consistency issues. So let's take it apart and show why the general populace is much better served with an OLTP database:
- Kafka is an Ops nightmare. The setup of a Cluster requires A LOT of configuration. Also Zookeper urgh, they're luckily trying to get rid of it, but I think they only dropped it this year, and I'm not sure how mature it is.
- You absolutely 100% need immutable Data-structures, or something else that manages Transactions for you inside the writer. You DO NOT want to manually rollback changes in your write model. Defensive copying is a clutch, slow, and error prone (cue JS, urgh...: {...state}).
- Your write model NEEDS to fit into memory. That thing is the needles eye that all your data has to go through. If you run the single event loop variant, latency during event application WILL break you. If you do the optimistic concurrency variant performing validation checks might be as or more expensive than recomputing the events from scratch.
- Be VERY weary of communications with external services that happen in your write model. They introduce latency, and they break your transactional boundary that you set up before. To be fair OLTPs also suffer from this, because it's distributed consistency with more than one writer and arbitrary invariants which this universe simply doesn't allow for.
- As mentioned before, it's possible to optimistically generate and apply events thanks to the persistent data structures, but that is also logic you have to maintain, and which is essentially a very naive and simple embedded OLTP, also be weary of what you think improves performance vs. what actually improves it. It might be better to have 1 cool core, than 16 really hot ones that do wasted work.
- If you don't choose your transactional domains well, or the requirements change, you're potentially in deep trouble. You can't transact across domains/partitions, if you do, they're the same domain, and potentially overload a single writer.
- Transactional domains are actually not as simple as they're often portrayed. They can nest and intersect. You'll always need a single writer, but that writer can delegate responsibility, which might be a much cheaper operation than the work itself. Take bank accounts as an example. You still need a single writer/leader to decide which account ID's are currently in a transaction with each other, but if two accounts are currently free that single writer can tie them into a single transactional domain and delegate it to a different node, which will perform the transaction and write and return control to the "transaction manager". A different name for such a transaction manager is an OLTP (with Row-Level locking).
- You won't find as many tutorials, if you're not comfortable reading scientific papers, or at least academic grade books like Martin Kleppmanns "Designing Data Intensive Applications" don't go there.
- You probably won't scale beyond what a single OLTP DB can provide anyways. Choose tech that is easy to use and gives you as many guarantees as possible if you can. With change capture you can also do retroactive event analytics and views, but you don't have to code up a write-model (and associated framework, because let's be honest this stuff is still cutting edge, and really shines in bespoke custom solutions for bespoke custom problems).
Having such an architecture under your belt is a super useful tool that can be applied to a lot of really interesting and hard problems, but that doesn't mean that it should be used indiscriminately.
Afterthought; I've seen so many people use an OLTP but then perform multiple transactions inside a single request handler, just because that's what their ORM was set to. So I'm just happy about any thinking that people spend on transactions and consistency in their systems, in whatever shape or form, and I think making the concepts explicit instead of hiding them in a complex (and for many unapproachable) OLTP/RDBMS monster helps with that (if Kafka is less of a monster is another story).
I think it's also important to not underestimate the programmer convenience that working with language native (persistent) data-structures has. The writer itself in its naive implementation is something that one can understand in full, and not relying on opaque and transaction breaking ORMs is a huge win.
PS: Plz start a something collaboratively with ObservableHQ, having reactive notebook based dashboards over reactive Postgres queries would be so, so, so, so awesome!
To date, I've never written code that reads from a Kafka topic that wasn't taking data and transforming + materializing it into domain intelligence
Little known fun fact, the rust type- and borrow-checker uses a datalog engine internally to express typing rules, and that engine was written and improved by Frank McSherry. So whenever you hit compile on a rust program, you're using a tiny bit of Materialize tech.
It serves as anti-marketing, at least to me.
What the authors mean is that kafka is not a traditional database and doesn't solve the same problems that traditional databases solve. Which is a useful distinction to make but is not the distinction they make.
The reality is that database is now a very general term and for many usecases you can choose to special purpose databases for what you need.
I'd argue that a filesystem is a data store, rather than a database.
Whether we want it to be so or not the term database is much more encompassing thank it used to be. You can try to fight that change if you want to but it means you'll be speaking a different language that most of the rest of us as a result.
You should be able to post "buy" messages to a topic without fear that it messes up your data integrity. Who cares if two people are fighting over the last item? You have a durable log. Post both "buys" and wait for the "confirm" message from a consumer that's reading the log at that point in time, validates, and confirms or rejects the buys. At the point that the buy reaches a consumer there is enough information to know for sure whether it's valid or not. Both of the buy events happened and should be recorded whether they can be fulfilled or not.
The two people, at least. Customers tend to be a bit underwhelmed by "well, the CAP theorem..." as a customer service script.
Kafka:
User clicks buy and it shows “processing” which behind the scenes posts the buy message and waits for a “confirmed” message. When it’s confirmed user is directed to success! If someone else posts the buy before them they get back a “failed: sold out” message.
Relational:
User clicks buy and it shows “processing” which behind the scenes tries to get a lock on the db, looks at inventory, updates it if there’s still one available, and creates a row in the purchases table. If all this works the user is directed to success. If by the time the lock was acquired the inventory was zero the server returns “failure: sold out”.
The example they give is very simplistic. With the correct design of kafka topics and events the problem of the example can be fixed.
And according to oracle https://www.oracle.com/database/what-is-database/ :
> A database is an organized collection of structured information, or data, typically stored electronically in a computer system.
So Kafka clearly fits that definition.
Honestly as long as you don't use it as a general purpose database, it might very well be the best choice for your use-case.
Also, a good starter to for knowing how to use a Streaming Store like Kafka as DB is the video Database inside-out. https://www.youtube.com/watch?v=fU9hR3kiOK0
Everything abstracted to the highest level is the same, but problems aren't solved at the highest level.
The devil, as they say, is in the details.
Funnily enough a list of events is pretty much what a transaction log is in a standard db. Although the events have more of a business meaning. In many ways event sourcing is removing a lot of abstraction databases give you.
Yes, ACID works for one database. Many databases? Not so much.
> Funnily enough a list of events is pretty much what a transaction log is in a standard db.
When I "SELECT Balance WHERE user = 12345", I usually just get back a balance, I don't get back the transaction log.
If nothing else, adopting the Kafka model gets your teammates to append updates to your ledger, rather than changing values in-place.
Normally you only want concurrency control within certain boundaries.
By figuring out the minimum amount transaction and concurrency boundaries you can inch out quite a bit of performance.
I understand that the feature set of timely dataflow is more flexible than Spark - I just don't understand why (I couldn't figure it out from the paper, academic papers really go over my head).
So, streaming one new record in and seeing how this changes the results of a multi-way join with many other large relations can happen in milliseconds in TD, vs batch systems which will re-read the large inputs as well.
This isn't a fundamentally new difference; Flink had this difference from Spark as far back as 2014. There are other differences between Flink and TD that have to do with state sharing and iteration, but I'd crack open the papers and check out the obligatory "related work" sections each should have.
For example, here's the first para of the Related Work section from the Naiad paper:
> Dataflow Recent systems such as CIEL [30], Spark [42], Spark Streaming [43], and Optimus [19] extend acyclic batch dataflow [15, 18] to allow dynamic modification of the dataflow graph, and thus support iteration and incremental computation without adding cycles to the dataflow. By adopting a batch-computation model, these systems inherit powerful existing techniques including fault tolerance with parallel recovery; in exchange each requires centralized modifications to the dataflow graph, which introduce substantial overhead that Naiad avoids. For example, Spark Streaming can process incremental updates in around one second, while in Section 6 we show that Naiad can iterate and perform incremental updates in tens of milliseconds.
The example I was taught with was a booking system, where the inventory management system-of-record was separate from the search system. Search does not need 100% up-to-date inventory. A delay between the last item being booked and it being removed from the search results is acceptable. In fact, it has to be acceptable, because it can happen anyway. If someone books the last item after another hit the search button... There's nothing the system can do about that.
When actually committing a booking, however, then that must be atomically done within the inventory management system.
So, to bring it home, it's OK for the search system to be eventually consistent against bookings, and read bookings off of an event stream to update its internal tracking. However, the bookings themselves cannot be eventually consistent without risking a double-booking.
On one hand the ability to connect multiple microservices to a central message broker is convenient, but on the the other hand this goes against the microservice philosophy of not sharing subcomponents (databases, etc). I wonder where the lines should be drawn.
It's all about framing and perspective, of course. But that's how I'd want to try and frame it from a system architecture point of view.
With enough framing everything is possible, and in some contexts it will even make sense.
Depending on the scenario I can see the point. If the micro services are all part of the larger overall solution, having a single cluster is perfectly fine. Using the same cluster for multiple "product" is a little like having one central database server for a number of different solutions. You can do it, but it potentially become a bottleneck or a central point for your different solutions to impact performance of each other.
you either don't allow microservices to consume from others' topics, or you publish event schemas so they can still iterate independently.
the move to a 2nd kafka cluster in my experience has always been driven by isolation and fault tolerance concerns, not scalability.
If ACID is a prerequisite, then lot of things won't classify as databases - None of Mongo, Cassandra, ElasticSearch etc. Not even many data-warehouses.
Predictably, this setup ran into an interesting assortment of issues. There were no real transactions, no ensured consistency, and no referential integrity. There was also no authentication or authorization, because a default-configured deployment of Kafka from Confluent happily neglects such trivial details.
To say this was a vast mess would be to put it lightly. It was a nightmare to code against once you left the fantasy world of functional programming nirvana and encountered real requirements. It meant pushing a whole series of concerns that isolation addresses into application code... or not addressing them at all. Teams routinely relied on one another's internal kafka streams. It was a GDPR nightmare.
Kafka Connect was deployed to bridge between Kafka and some real databases. This was its own mess.
Kafka, I have learned, is a very powerful tool. And like all shiny new tools, deeply prone to misuse.
Expecting 1 person to make 100% correct decisions all the time is too much expectation for one person. People go down rabbit holes and they have weird takeaways, like replace all the databases with queues.
In this particular company, the Chief Architect in theory had a group around him. They did nothing to check his poor decisions, and from the outside seemed primarily interested in living in the functional programming nirvana he promised them.
The idea on the face of it is not per se a bad one, quite interesting, but the implementations are perhaps not there yet to back such an idea.
It's important to know as an architect when your vision for the architecture is outpacing reality, to know your time horizon, and match the vision with the tools that help you implement the actual use cases you have in your hand right now.
It sounds like this person might have had an interesting idea but not a working system. In another light, this could have been a good idea if all the technology was in place to support it.. but the timing and implementation doesn't sound like it was right, perhaps.
The old saying "use the right tool for the job" comes to mind, but that can be hard to see when the tools are changing so fast, and there is a risk to going too far onto the bleeding edge. Perhaps the saying should have been, "use the rightest tool you can find at the time, that gives you some room to grow, for the job"...
Several of these points fell apart when credit card handling and PCI-DSS entered the picture.
Which is the right way to do it, because transactions don't extend into the real world. If you need to wait for the consequences of a given event, wait for the consequences of that event. Otherwise, all you really care about is all events happening in a consistent order. It's a much more practical consistency model.
> and no referential integrity
The problem with enforcing referential integrity is how you handle violations of it. Usually you don't really want to outright reject something because it refers to something else that doesn't exist yet, so you end up solving the same problem either way.
> There was also no authentication or authorization, because a default-configured deployment of Kafka from Confluent happily neglects such trivial details.
Pretty common in the database world - both MySQL and PostgreSQL use plaintext protocols by default. Properly-configured kafka uses TLS and/or SASL and has a good ACL system and is as secure as anything else.
> It was a nightmare to code against once you left the fantasy world of functional programming nirvana and encountered real requirements. It meant pushing a whole series of concerns that isolation addresses into application code... or not addressing them at all.
My experience is just the opposite - ACID isolation sounds great until you actually use it in the real world, and then you find it doesn't address your problems and doesn't give you enough control to fix it yourself. It's like when you use one of those magical do-everything frameworks - it works great until you need to customise something slightly, then it's a nightmare. Kafka pushes more of the work onto you upfront - you have to understand your dataflow and design it explicitly - but that pays off immensely.
> It was a GDPR nightmare.
Really? I've found the exact opposite - teams that used an RDBMS had to throw away their customer data under GDPR, because even though they had an entry in their database saying that the customer had agreed, they couldn't tell you what the customer had agreed to or when. Whereas teams using Kafka in the way you describe had an event record for the original agreement, and could tell you where any given piece of data came from.
This is absolutely wonderful! Unfortunately, this team decided to store data subject to GDPR deletion requests in Kafka, where deletion is quite difficult. It was a problem, when trying to do deletion programmatically, across many teams using the same set of topics.
The real nightmare came when this team, obsessed with the power of infinite retention periods, encountered PCI-DSS. You see, the business wanted to move away from Stripe and similar to dealing with a processor directly, in order to save on transaction fees. So obviously they could just put credit card data into Kafka...
Updating balances using an RDBMS is like managing your finances with pencils and erasers. Unless you somehow ban the UPDATE statement.
Updating balances with Kafka is like working in pen. You can't[1] change the ledger lines, you can only add corrections after the fact.
[1] Yes, Kafka records can be updated/deleted depending on configuration. But when your codebase is written around append-only operations, in-place mutations are hard and corrections are easy, so your fellow programmers fall into the 'pit of success'.
The original sin of Kafka is that it begins with only memory.
To me the right middle way is relational temporal tables[0]. You get both the memoryless query/update convenience and the ability to travel through time.
[0] SQL:2011 introduced temporal data, but in a slightly janky just-so-happens-to-fit-Oracle's-preference kind of way.
I'm not selling any particular tool but Amazon's QLDB is an interesting example of a blockchain-based database. I am interested to see how things like Kafka and this might come together somehow.
In my opinion, we have evolved to a point where storage is not a concern for temporal use cases - i.e. we can now store every change in an immutable fashion. When you think about this being an append only transaction log that you never have to purge, and you make observable events on that log (which is what most CDC systems do)... yeah it works. Now you have every change, cryptographically secure, with events to trigger downstream consumers and you can really rethink the whole architecture of monolithic databases vs. data platforms.
It is an exciting time we are in, in my opinion.
Just my $0.02.
1) Write a event recording a _desire_ to checkout. 2) Build a view of checkout decisions, which compares requests against inventory levels and produces checkout _results_. This is a stateful stream/stream join. 3) Read out the checkout decision to respond to the user, or send them an email, or whatever.
CDC is great and all, too, but there are architectures where ^ makes more sense than sticking a database in front.
Admittedly working up highly available, stateful stream-stream joins which aren't challenging to operate in production is... hard, but getting better.
Money quote: "Event-sourced architectures like these suffer many such isolation anomalies, which constantly gaslight users with “time travel” behavior that we’re all familiar with."
> The fundamental problem with using Kafka as your primary data store is it provides no isolation.
This is false. I can only assume the author doesn't know about the Kafka transactions feature?
To be specific, Kafka's transaction machinery offers read-committed isolation, and you get read-uncommitted by default if you don't opt-in to use that transaction machinery (the docs: https://kafka.apache.org/0110/javadoc/index.html?org/apache/...). Depending on your workload, read-committed might be sufficient for correctness, in which case you can absolutely use Kafka as your database.
Of course, proving that your application is sound with just read-committed isolation is can be challenging, not to mention testing that your application continues to be sound as new features are added.
Because of that, in general I think that the underlying point of this article is probably correct, in that you probably shouldn't use Kafka as your database -- but for certain applications / use-cases it's a completely valid system design choice.
More generally this is an area that many applications get wrong by using the wrong isolation levels, because most frameworks encourage incorrect implementations by their unsafe defaults; e.g. see the classic "Feral concurrency control" paper http://www.bailis.org/papers/feral-sigmod2015.pdf. So I think the general message of "don't use Kafka as your DB unless you know enough about consistency to convince yourself that read-committed isolation is and will always be sufficient for your usecase" would be more appropriate (though it's certainly a less snappy title).
https://www.postgresql.org/docs/9.5/transaction-iso.html
If you're arguing that in practice this isn't enough isolation, then sure, that's what I said in my post; most applications need more than the default isolation levels. I feel like you're making an absolutist point (just like the original article) where my point was that the domain is actually more nuanced, and absolutes just obscure the technical complexity.
And you'll have exactly the same problem if you're using a traditional ACID database: the user saw the item as being available, clicked buy, but it was unavailable by the they went to get it. Using an ACID database doesn't gain you anything; you might as well just use Kafka for everything.
The user having stale data in their browser and finding an item has already been purchased is very different from the database, within a transaction, allowing an item which has already been purchased to be purchased again.
This is like "the operation was a success, but the patient died". What matters is the user-facing behaviour of your whole system; a transaction that can't actually cover the parts the user cares about is pointless.
If you open an ACID transaction when the user adds something to the cart and don't close it until they check out, you'll find your database gets locked up pretty quickly. So you can't actually use the ACID transactions to implement the behaviour you want - you have to implement some kind of reserve/commit semantics in userspace, whether you're using an ACID database or not.
[0] https://www.oreilly.com/library/view/strata-hadoop/978149194...
Eventually, my buffer ran out of memory and I couldn't write anything else to it, and it was dropping lots of messages. I was bummed. Is there a way to avoid this in Kafka?
RabbitMQ is the most widely used open source implementation of the AMQP protocol. It is slower but can support complex routing scenarios internally and handle situations were at-least-once-delivery guarantees are important. RabbitMQ supports on-disk persistent queues, which you can tune if you like. Compared to Kafka, RabbitMQ is slow in terms of volume that can be managed per queue.
Kafka is fast because it is horizontally scalable and you have parallel producers and consumers per topic. You can tune the speed and move needle where you need between consistency and availability. However, if you want things like at-least-once-delivery and such, you'll have to use the building blocks kafka gives you, but ultimately you'll have to handle this on the application side.
Regarding storage, by default kafka stores data for 7 days. IIRC the NY Times stores articles from 1970 onwards on kafka clusters. The storage is horizontally scalable and durable. This is a common use case. As many have pointed out, the cluster setup depends highly on you needs. We store data for 7 days in kafka as well and it's in the order of 500GB or more per node.
Looks like you have a configuration issue. You can configure rabbitMQ to store queues on the hard disk and with a quick calculation you can make sure you have enough space for 10 or 150 hours of data. I don't see any reason to switch to kafka, a different tool with different characteristics, just because you need more storage.
Not a relational one. And should not replace a typical CRUD OLTP DB.
But it sure seems like a no-sql DB to me.
For example: Try to retrieve a list of all the 10000 movies you stored in an Elasticsearch index. You will get the first 100 results easily, but is you scroll through the results, you will notice that elasticsearch will become very slow.
It is not optimized for that use case. Otherwise it would be called ElasticDB.
Martin Kleppmann | Kafka Summit SF 2018 Keynote (Is Kafka a Database?) [1]
Anyone successfully implemented actor model framework over kafka?
interested in learning others' experience
Other things that are not a database: Apache Traffic Server, Apache Mahout, Apache Jakarta, Apache ActiveMQ... hundreds of these exist.