"Based on user feedback, Kafka seems to be the the primary means that developers are interested in ingesting data from. We rewrote the onyx-kafka plugin from scratch to essentially mirror what Storm's Kafka Spout provides. That is, the Onyx Kafka plugin will dynamically discover brokers from ZooKeeper and automatically reconnect to an In Sync Replica (ISR) if any brokers go down. We also took a little detour to create onyx-s3. onyx-s3 is an S3 reader that handles faults by check-pointing to ZooKeeper."
I was able to develop in the REPL, which was a huge win, and deploying is as simple as uberjar / copy to host.
We have around 10 workflows to store pre-calculated values and 'short-circuit' reference collections at the moment, and we're adding more all the time as we find hotspots in our web-tier Datomic queries that we want to speed up.
It's wonderful that we can use all the same notions we're familiar with in the rest of our stack (Clojure, ClojureScript, Datomic) – data-oriented, functional, immutable, dynamically typed. We get to use the same simple paradigm for the entire lifecycle of a user interaction. It's incredibly empowering.
We started with 0.5, and patiently fought through the difficulties that HornetQ produced, because despite those difficulties, it was a real pleasure to write code for Onyx. Now that 0.6 is out, with metrics, no HornetQ, a significantly faster dev mode thanks to the core.async transport, and a cleaner lifecycle API, it feels like we've been given super powers!
Michael and Lucas (the two core team members I've interacted with) are incredibly receptive to feedback and tremendously eager to help out if you get stuck, and we have learned a hell of a lot about this game from them.
If you're in Clojure at all, and you need something like this, look no further. Heck, even if you're not in Clojure, you should take a look.
Soon after that we will move our audio upload/transcoding process into a different pipeline as well.
If you're already using clojure, this is a no-brainer if you want to do stream or batch processing.