How to use RabbitMQ in service integration (opens in new tab)

(erlang-solutions.com)

63 pointsolikas5y ago42 comments

42 comments

RabbitMQ went on my "never again" list of things after dealing with it for years as part of OpenStack and projects before that. At the time there wasn't many viable alternatives to AMQP that were OSS and reasonable.

However many split brains and grey hairs later I decided RabbitMQ was almost never worth it regardless of how many of AMQPs advanced features you could make use of.

For the longest time I just made do with Kafka but this had serious deficiencies when implementing queues because of the cumulative ack only nature of Kafka.

Recently I have started using Pulsar which provides selective ack and all the best parts of AMQP without the complexity and unneeded parts. i.e it has things like scheduled delivery and TTLs in addition to the all important shared subscription which makes queues "just work" on top of streams.

If you want something like RabbitMQ but with a simpler API and are comfortable with JVM services give Pulsar a go. It's not for everyone but if you are already using a lot of the big data stack it's probably a good fit.

bironran5y ago

Can only echo parent. 3 places of work in varying sizes, 4 projects in varying maturity, not a single RMQ administration staff that was competent enough to reliably run the cluster.

Which of course leads me to believe the problem isn't with the people but with the ridiculously high threshold of knowledge, experience and app developer self-control needed to run RMQ successfully.

As parent said, many meltdowns later, I'm now firmly in the "No Rabbit!" camp. Redis pubsub/queues for immediate lossy delivery, kafka / gcp pubsub / aws sqs for less latency sensitive flows that require more consistency guarantees.

datavirtue5y ago

I concure. I like RMQ because I know how to configure and administer it but I would never trust anyone with it. My first exposure to it was on a project where the lead architect failed to read the documentation let alone understand any of it. Several years later I was able to fix it and then left it in the hands of some other incompetents.

Serow2255y ago

I feel dumb saying it, but as someone whose had a lot of experience with messages buses (Rabbit and AMQP1.0), I've always struggled to understand what domains/situations Kafka is actually the best fit for. It's probably because of the areas that I work in which doesn't make it obvious, but I'd love to hear what exact scenarios it does actually make sense to use Kafka for instead of Rabbit or AMQP1.0 :)

RabbitMQ is one of those things that I've always found better to let the experts run (managed SaaS), unless your team is really wanting to take on the burden of becoming an Erlang distributed system debugger :)

Pulsar seems really interesting... There are now more managed Pulsar offerings coming online (StreamNative, DataStax who bought Kesque, Pandio, etc)

rad_gruchalski5y ago

Well, they’re two different things. AMQP 1.0 is everyone to everyone messaging where every party can be a consumer and a server. RabbitMQ, traditionally, is a queue. You add a message, you take it and lock it, if no processing confirmation, it is released back for someone else to process. Kafka is an append only log. You put a message in and consumers just roll over them. Rabbit/amqp is random access, Kafka sucks at random access. With amqp, you’ll have hundreds/thousands of queues, this may be difficult with Kafka.

You’d use Kafka more as an unbounded buffer and build different paradigms on top of it. It not unusual to ingest 100s if mbits of data into kafka, potentially saturating the network while also reading that much out. Amqp is better for large number of queues where each queue has less messages in. Think mqtt, websockets - many, many consumers.

It would be reasonable to use both next to each other.

But I’d never go for rabbitmq. I’d go for azure servicebus or artemis with qpid.

Serow2255y ago

Thank you, that is quite helpful :)

jpgvm5y ago

Kafka is distinctly different from enterprise messaging systems like AMQP.

I generally think of messaging systems falling into 4 distinct categores: PubSub, Streaming, Queues and Enterprise Messaging Systems.

PubSub sytems are focused on non-durable (usually), low latency messaging generally without acknowledgements and generally at-most-once. i.e things like Redis PUBSUB, NATS, etc

Queues are generally focused on fanout to multiple consumers with at-least-once processing of durable messages with acknowledgements. i.e Celery/Sidekiq, Que, AWS SQS.

Streaming systems are designed for throughput and usually are based on some form of a distributed log concept. Generally offload offset management to consumers. i.e Kafka, Kinesis

Enterpise Messaging Systems favor flexibility above all else and usually have some mechanism of encoding the flow of data separately from the applications themselves. i.e exchange routing topologies in AMQP as an example. They can generally implement pubsub, queues and direct messaging paradigms. Tradeoffs being poorer availability, complexity and poor performance vs specialised systems. i.e RabbitMQ, HornetMQ, etc.

So you end up using Kafka when it's limitations aren't a problem and you need the throughput. It works best when each and every message in a stream is homogenous as such failure to process a message is unlikely to be independent of failure to process a following message. This alleviates the main drawback of streaming systems which is head of line blocking.

Some cases where it works very well is event streams, data replication/CDC, etc.

Serow2255y ago

Thank you, that is very helpful :) > "So you end up using Kafka when it's limitations aren't a problem and you need the throughput." I think that's what I've run into so far, is that my usecases haven't needed the throughput of Kafka and so all I'm left with is the feature gaps that I'd miss from something like AMQP1.0

Apache Pulsar looks pretty interesting :)

reader_10005y ago

> I'd love to hear what exact scenarios it does actually make sense to use Kafka for instead of Rabbit or AMQP1.0 :)

If order of message processing matters, then Kafka is better suited then AMQP. For example, In a distributed application for money transfers, if AMQP used, message order will be lost and some problems will occur in the following scenario:

User A with an accound of $1000 makes order for two transfers T1 ($600) and T2 ($500)

  - Rabbit delivers T1 to server1, before processing message, server1 enters a full GC.
  - Rabbit delivers T2 to server2 and server2 processes message immediately, now User A's account have $500  
  - Server1 resumes its life after the end of GC, but fails to process T1 since account's balance is less than required amount.

However, it is T2 that should have failed because User A ordered T1 first and T2 after.

In Kafka, when user account identifier is used for partitioning key, all User A's messages will be processed by same consumer (i.e server1), so even if server1 enters a full GC, that is OK, since T2 will be processed after T1.

Serow2255y ago

FWIW, AMQP1.0 does support Sessions which can be used to address this scenario :)

manigandham5y ago

Kafka was designed for scale and developed at LinkedIn to meet their extremely high throughput. It's a distributed log, basically writing to an append-only file that's partitioned by a hash of a key (that can be set on every message).

It makes the brokers as dumb as possible to optimize for performance and the logic sits in the client. You can ask to read back from the log at any point or at the current tail. Acknowledging messages is just writing a bookmark to another topic saying where you last read up to, or you can keep track of it yourself somewhere else.

You can always build more complex logic on top which Confluent has done with things like ksqldb.

suresk5y ago

Pulsar definitely looks like a great combo of capabilities from Kafka + AMQP. I have been wanting to try it out with some of our stuff, but inertia/time constraints have made it hard to consider moving away from Rabbit.

I'm curious what makes it go on your "never again" list? We've definitely had our fair share of issues with it, namely -

- Really easy to misconfigure queues/exchanges, especially trying to do something like have a retry + DLQ setup.

- If you have a queue build up to a large number of messages (100 million+) for whatever reason, purging it will probably bring down the cluster.

Overall, our experience has been mostly positive. It isn't on my "never again" list, but I'm definitely wary of some parts of it and it is on my list of one of the more difficult pieces of our infrastructure to scale.

manigandham5y ago

Pulsar is great but also a heavy install. There are lots of messaging products now and I would recommend NATS as the best 1:1 replacement of RabbitMQ if you need a message broker with advanced routing: https://nats.io/

closeparen5y ago

>For the longest time I just made do with Kafka but this had serious deficiencies when implementing queues because of the cumulative ack only nature of Kafka.

We built a Kafka consumer that's effectively capable of selective acks by producing bad messages to separate topics. It's a little silly but it works.

KptMarchewa5y ago

DLQ is the way to do this. Nothing silly about it.

closeparen5y ago

It's a heavyweight custom client to hack Kafka into being something it isn't, that other message brokers are. That's what's silly.

robalfonso5y ago

Wow, and here I was feeling like I was the only one who thought this way! Such a similar experience to yours.

turtlequeue5y ago

While RabbitMQ is not a “never again” for me, I agree that making it work reliably does involve a few knobs and architectural tricks. I have been using AMQP/Rabbit/Kafka previously (in that order), and switched to Pulsar where I can since 2017 ( https://stackoverflow.com/a/47477765 ).

It has been great overall.

So good I have recently decided to slow down client work and build a managed SaaS offering for Pulsar: https://turtlequeue.com It is a work in progress, however it is a bit different from the nascent Pulsar offerings out there.

The main goal are ease of use and being cheap. How do I go about it?

1. Behind the scenes there is only one pulsar cluster. This lowers the costs of hosting dramatically. Even the smallest production pulsar cluster requires:

  - ZooKeeper node(s)
  - Bookies nodes
  - Brokers node(s)
  - (optional) Function workers node(s)
  - (optional) Proxies node(s)
  - Pulsar Manager
  - Prometheus
  - Grafana

.. typically this runs on top of kubernetes these days, so throw in volume storage and a LoadBalancer. Hosting small setups is costly. By having a shared cluster I can lower the costs enough to provide a free “try me” service at little to no cost to me. And nobody will suffer from the “noisy neighbour” as Pulsar is designed to be multi-tenant and can enforce limits per tenant.

2. Tq (turtlequeue) users do not have to care about how the cluster operates (typical SaaS). It is also dramatically easier for me to monitor and operate only one cluster.

3. How do I expose this safely and make it easy for users to use Pulsar then? Experienced Pulsar users will notice that this is not easy to do at the moment with pulsar. I am developing a custom proxy! This in turns allows me to collect metrics/enforce finer permissions, present a nicer dashboard.

Where am I now? The custom proxy works, the website/docs/login/dashboard/metrics/pricing need a lot of TLC. So “soon”. I will be looking for beta testers, if you are interested please email turtle@turtlequeue.com Feel free to email me too if you just want to be kept in the loop :)

jayd165y ago

Hmm, what issues did you run into? I've used it on a few projects in a mirrored way and it was always fine. Is the clustering the issue?

reacharavindh5y ago

Not OP, but that is my experience. It worked like a rock on a single server. Clustering brought us issues rooting from its complexity. Split brain scenario, corruption of the Mnesia database and such. We went back to single server mode.

jayd165y ago

>Mnesia

Is Mnesia just terrible or is there some trick? I did run into issues you're talking about with ejabberd clusters.

Plugawy5y ago

Same! Never bothered with the so-called HA setup after running a cluster for few months. Making all messages durable + backups of underlying storage are sufficient, while the do not prevent an outage, at least bringing the system back to an operational state is fairly straightforward

varispeed5y ago

I know workplaces where you can get sacked just for mentioning RabbitMQ...

eternalban5y ago

I've skimmed the comments here and of course the article. Some points based on experience with Kafka, RabbitMQ and exploratory (and ahead of hype cycle) look at Pulsar:

RabbitMQ is an excellent messaging middleware. But simply remember that it is not designed for optimal performance when holding on to data. "It is a river, not a lake". Performance is very sensitive to the amount of data that is in flight between send and receive end points.

Kafka is a "lake", but will not give you the rich routing and diverse semantics of Rabbit. If you are building 'event sourced' systems, Kafka and similar systems are a better choice.

Pulsar has a highly articulated architecture. It is built on Bookkeeper and decouples the persistent store servers from the client servicing servers. If you want to avoid the rebalancing pain of Kafka, Pulsar is the solution. However, Puslar has many more moving pieces.

Both RabbitMQ and Pulsar are authentic 'middle ware', and extensible. Kafka is, true to its genesis, a highly performant distributed log.

Durable, recoverable, and performant distributed messaging/journaling is inherently complex. Make sure you know what is it that you precisely require, and one of the above solutions will likely serve you well.

rad_gruchalski5y ago

Mostly agreed but I’d say: Pulsar is a lake, Kafka is a river, RabbitMQ is a creek. Kafka still moves fast and once you out of retention, difficult to read past. Pulsar, thanks to adding bookies to the storage pool, can grow and does not have a retention to commit to.

perlgeek5y ago

At work, we've been using a RabbitMQ instance as a central integration point (for about 7ish years now, I believe), and it's been working very well so far.

My observations so far:

* Running a single RabbitMQ is pretty boring in the good sense.

* We haven't managed to switch to a cluster for HA yet; it seems that software that deals with RabbitMQ clusters must be cluster aware (consume from queues on all instances and the likes), and it wasn't worth our effort to fix all the applications.

* In the long run, the lack of tooling is hurting us. Want to do green-blue deployments? Canary deployments? When your services run on HTTP(S), there are simply tools for that. When your services consume from AMQP queues? You have to go searching for solutions, possibly build your own plumbing.

In the end, it turns out that we use publish/subscribe far less often that direct request/response patterns, so for new stuff I'd likely go with HTTPS instead of AMQP today.

100011_1000015y ago

I find that one of the major drawbacks of RabbitMQ is how much CPU it requires to run efficiently. If you add clustering on top of it it can easily become the biggest CPU hog in a containerized cluster. However it does what it says, and it does quite well.

amirkdv5y ago

Curious: what kind of load are you throwing at RabbitMQ?

pkb5y ago

DON'T. RabbitMQ is fragile and breaks under low memory/high swap conditions. Usually due to epmd breaking. I know, have to fix it (usually by killing epmd) daily.

haswell5y ago

Is this a reason to not use it at all, or a reason to ensure the environment is properly tuned if you do use it?

Seems your conclusion “DON’T” implies the former, but this seems unnecessarily extreme.

winrid5y ago

I've seen downstream services fail and RMQ have to swap hundreds of millions of messages to disk, and then resume by itself as consumers picked up... maybe this issue you see affects certain versions?

mianos5y ago

Same here in regards to liking RabbitMQ until you need to scale it. I wasted a huge amount of time years ago trying to build a HA cluster that worked when you hit it with a load.

What didn't help was the pika python client had so many issues. One of my github problem reports had a 20 line demo showing how it broke with anything but a trivial load. I gave up soon after as I had further problems in other languages. Over a year later someone looked at it and said 'yep fails under load' and fixed it.

One of those projects that showed so much promise only to make me sad.

nesarkvechnep5y ago

When I proposed to use RabbitMQ for service integration my colleagues didn't agree to go with it because it "won't scale" and AWS doesn't provide a managed service. Because of an AWS fetish and awful development experience the product won't launch anytime soon.

pyepye5y ago

AWS released the option to have a managed RabbitMQ broker under Amazon MQ back in November (2020) https://aws.amazon.com/about-aws/whats-new/2020/11/announcin...

jpalomaki5y ago

One thing to note is the pricing. Clustering is not supported with the micro instance, so if you need it, you must go with at least mq.m5.large [1] and you need three of them, so the total cost is around $620 per month (or closer to $700 for example in eu-central-1).

[1] https://docs.aws.amazon.com/amazon-mq/latest/developer-guide...

KptMarchewa5y ago

Why not use SQS if you're bound to AWS requirement?

I've found RabbitMQ difficult operationally, and full of footguns (that can make you lose data), so I'm not sure why would you want to use it if you don't already have to.

dsign5y ago

Not SQS, but Google PubSub: everything was working allright, until one day messages started arriving with horrible latency and vital parts of our service went down. We spent close to two weeks debugging our infrastructure, because _sure it could not be PubSub_. Well, _it was_. I guess nothing too big, a small bit or tweak that changed in their service that didn't propagate to the client library we were using, or god knows what. Now we run RabbitMQ for everything; at least we control our versions and we can confidently blame ourselves when something happens. But RabbitMQ, standalone and clustered, has served us well across different systems for about fifteen years.

cratermoon5y ago

At a previous employer I had what I thought was close to a slam dunk case for messaging with RabbitMQ. While I'm not the most experienced person in developing distributed messaging systems, and I expected technical objections, the disheartening response, and ultimate reason for rejection, was simply organizational inertia. In short, "you can't use RabbitMQ and messaging for that because we don't use messaging here". I'm sure there was some history that nobody mentioned or some lost institutional knowledge. The "grandma's ham" of systems design.

zomglings5y ago

I can sympathize with your colleagues. Developing the distributed messaging system is not where the bulk of the cost of running such a system goes - it's the maintenance and debugging that soak up time, money, and tears.

While my past experiences with RabbitMQ in production have been stellar, I can see why a team would be hesitant to add this complexity to their infrastructure.

cratermoon5y ago

Oh here's the thing though: the company was already using RabbitMQ, as well as Kafka (for other things, mostly data science), and it was all deployed in Kubernetes and replicated across three data centers, including one in Amsterdam. The complexity was already there, but only for a very restricted set of applications.

datavirtue5y ago

All of these problems sound like they come from developers who don't know how to develop distributed systems. If anything this should be simplifying an otherwise intractible problem. I have seen people overuse messaging and had to fix what I could. I otherwise reach for RMQ as a simple solution to horizontal scaling and I write the software to easily switch brokers (Aws SQS, on prem RMQ, cloud RMQ). It's not something that you choose to adopt and then force developers to make every interaction a distributed message.

1 more reply

j / k navigate · click thread line to collapse

42 comments

jpgvm5y ago

However many split brains and grey hairs later I decided RabbitMQ was almost never worth it regardless of how many of AMQPs advanced features you could make use of.

For the longest time I just made do with Kafka but this had serious deficiencies when implementing queues because of the cumulative ack only nature of Kafka.

bironran5y ago

Can only echo parent. 3 places of work in varying sizes, 4 projects in varying maturity, not a single RMQ administration staff that was competent enough to reliably run the cluster.

Which of course leads me to believe the problem isn't with the people but with the ridiculously high threshold of knowledge, experience and app developer self-control needed to run RMQ successfully.

datavirtue5y ago

Serow2255y ago

Pulsar seems really interesting... There are now more managed Pulsar offerings coming online (StreamNative, DataStax who bought Kesque, Pandio, etc)

rad_gruchalski5y ago

It would be reasonable to use both next to each other.

But I’d never go for rabbitmq. I’d go for azure servicebus or artemis with qpid.

Serow2255y ago

Thank you, that is quite helpful :)

jpgvm5y ago

Kafka is distinctly different from enterprise messaging systems like AMQP.

I generally think of messaging systems falling into 4 distinct categores: PubSub, Streaming, Queues and Enterprise Messaging Systems.

PubSub sytems are focused on non-durable (usually), low latency messaging generally without acknowledgements and generally at-most-once. i.e things like Redis PUBSUB, NATS, etc

Queues are generally focused on fanout to multiple consumers with at-least-once processing of durable messages with acknowledgements. i.e Celery/Sidekiq, Que, AWS SQS.

Streaming systems are designed for throughput and usually are based on some form of a distributed log concept. Generally offload offset management to consumers. i.e Kafka, Kinesis

Some cases where it works very well is event streams, data replication/CDC, etc.

Serow2255y ago

Apache Pulsar looks pretty interesting :)

reader_10005y ago

> I'd love to hear what exact scenarios it does actually make sense to use Kafka for instead of Rabbit or AMQP1.0 :)

User A with an accound of $1000 makes order for two transfers T1 ($600) and T2 ($500)

  - Rabbit delivers T1 to server1, before processing message, server1 enters a full GC.
  - Rabbit delivers T2 to server2 and server2 processes message immediately, now User A's account have $500  
  - Server1 resumes its life after the end of GC, but fails to process T1 since account's balance is less than required amount.

However, it is T2 that should have failed because User A ordered T1 first and T2 after.

Serow2255y ago

FWIW, AMQP1.0 does support Sessions which can be used to address this scenario :)

manigandham5y ago

You can always build more complex logic on top which Confluent has done with things like ksqldb.

suresk5y ago

I'm curious what makes it go on your "never again" list? We've definitely had our fair share of issues with it, namely -

- Really easy to misconfigure queues/exchanges, especially trying to do something like have a retry + DLQ setup.

- If you have a queue build up to a large number of messages (100 million+) for whatever reason, purging it will probably bring down the cluster.

manigandham5y ago

closeparen5y ago

>For the longest time I just made do with Kafka but this had serious deficiencies when implementing queues because of the cumulative ack only nature of Kafka.

We built a Kafka consumer that's effectively capable of selective acks by producing bad messages to separate topics. It's a little silly but it works.

KptMarchewa5y ago

DLQ is the way to do this. Nothing silly about it.

closeparen5y ago

It's a heavyweight custom client to hack Kafka into being something it isn't, that other message brokers are. That's what's silly.

robalfonso5y ago

Wow, and here I was feeling like I was the only one who thought this way! Such a similar experience to yours.

turtlequeue5y ago

It has been great overall.

The main goal are ease of use and being cheap. How do I go about it?

1. Behind the scenes there is only one pulsar cluster. This lowers the costs of hosting dramatically. Even the smallest production pulsar cluster requires:

  - ZooKeeper node(s)
  - Bookies nodes
  - Brokers node(s)
  - (optional) Function workers node(s)
  - (optional) Proxies node(s)
  - Pulsar Manager
  - Prometheus
  - Grafana

2. Tq (turtlequeue) users do not have to care about how the cluster operates (typical SaaS). It is also dramatically easier for me to monitor and operate only one cluster.

jayd165y ago

Hmm, what issues did you run into? I've used it on a few projects in a mirrored way and it was always fine. Is the clustering the issue?

reacharavindh5y ago

jayd165y ago

>Mnesia

Is Mnesia just terrible or is there some trick? I did run into issues you're talking about with ejabberd clusters.

Plugawy5y ago

varispeed5y ago

I know workplaces where you can get sacked just for mentioning RabbitMQ...

eternalban5y ago

I've skimmed the comments here and of course the article. Some points based on experience with Kafka, RabbitMQ and exploratory (and ahead of hype cycle) look at Pulsar:

Kafka is a "lake", but will not give you the rich routing and diverse semantics of Rabbit. If you are building 'event sourced' systems, Kafka and similar systems are a better choice.

Both RabbitMQ and Pulsar are authentic 'middle ware', and extensible. Kafka is, true to its genesis, a highly performant distributed log.

rad_gruchalski5y ago

perlgeek5y ago

At work, we've been using a RabbitMQ instance as a central integration point (for about 7ish years now, I believe), and it's been working very well so far.

My observations so far:

* Running a single RabbitMQ is pretty boring in the good sense.

In the end, it turns out that we use publish/subscribe far less often that direct request/response patterns, so for new stuff I'd likely go with HTTPS instead of AMQP today.

100011_1000015y ago

amirkdv5y ago

Curious: what kind of load are you throwing at RabbitMQ?

pkb5y ago

DON'T. RabbitMQ is fragile and breaks under low memory/high swap conditions. Usually due to epmd breaking. I know, have to fix it (usually by killing epmd) daily.

haswell5y ago

Is this a reason to not use it at all, or a reason to ensure the environment is properly tuned if you do use it?

Seems your conclusion “DON’T” implies the former, but this seems unnecessarily extreme.

winrid5y ago

mianos5y ago

Same here in regards to liking RabbitMQ until you need to scale it. I wasted a huge amount of time years ago trying to build a HA cluster that worked when you hit it with a load.

One of those projects that showed so much promise only to make me sad.

nesarkvechnep5y ago

pyepye5y ago

AWS released the option to have a managed RabbitMQ broker under Amazon MQ back in November (2020) https://aws.amazon.com/about-aws/whats-new/2020/11/announcin...

jpalomaki5y ago

[1] https://docs.aws.amazon.com/amazon-mq/latest/developer-guide...

KptMarchewa5y ago

Why not use SQS if you're bound to AWS requirement?

I've found RabbitMQ difficult operationally, and full of footguns (that can make you lose data), so I'm not sure why would you want to use it if you don't already have to.

dsign5y ago

cratermoon5y ago

zomglings5y ago

While my past experiences with RabbitMQ in production have been stellar, I can see why a team would be hesitant to add this complexity to their infrastructure.

cratermoon5y ago

datavirtue5y ago

1 more reply

j / k navigate · click thread line to collapse