https://www.youtube.com/watch?v=YpMKUQKlak0&ab_channel=ACMSI...
I thought this would introduce the language to choreograph separate Rust binaries into a consistent and functional distributed system but it looks more like you're writing DFIR the whole way through and not just as glue
DFIR is more of a middle-layer DSL that allows us (the high-level language developers) to re-structure your Rust code to make it more amenable to low-level optimizations like vectorization. Because DFIR operators (like map, filter, etc.) take in Rust closures, we can pass those through all the way from the high-level language to the final Rust binaries. So as a user, you never interact with DFIR.
I know different people have worked on dataflow and remember thinking Materialize was very cool and I've used Kafka Streams at work before, and I remember thinking that a framework probably made sense for stitching this all together
My knee-jerk excitements is that this has the potential to be pretty powerful specifically because it's based on Rust so can play really nicely with other languages. Spark runs on the JVM which is a good choice for portability but still introduces a bunch of complexities, and Dask runs in Python which is a fairly hefty dependency you'd almost never bring in unless you're already on python.
In terms of distributed Rust, I've had a look at Lunatic too before which seems good but probably a bit more low-level than what Hydro is going for (although I haven't really done anything other than basic noodling around with it).
Is that something Akka / RX offer? My quick thought is that they structure code in one binary.
https://rise.cs.berkeley.edu/projects/
Most data processing and distributed systems have some sort of link back to the research this lab has done.
Heh. "most data processing and distributed systems"? Surely you don't mean that the rest of the world was sitting tight working on uniprocessors until this lab got set up in 2017!
- Describe a dataflow graph just like Timely - Comes from a more "semantic dataflow" kind of heritage (frp, composition, flow-of-flows, algebraic operators, proof-oriented) as opposed to the more operationally minded background of Timely - Has a (very) different notion of "progress" than Timely, focused instead of ensuring the compositions are generative in light of potentially unbounded streaming inputs - In fact, Flo doesn't really have any notion of "timeliness", no timestamping at all - Supports nested looping like Timely, though via a very different mechanism. The basic algebra is extremely non-cyclic, but the nested streams/graphs formalism allows for iteration.
The paper also makes a direct comparison with DBSP, which as I understand it, is also part of the Timely/Naiad heritage. Similar to Timely, the authors suggest that Flo could be a unifying semantic framework for several other similar systems (Flink, LVars, DBSP).
So I'd say that the authors of Flo are aware of Naiad/Timely and took inspiration of nested iterative graphs, but little else.
This is also one of the core differences of Timely compared to DBSP, which uses a flat representation (z-sets) to store retractions rather than using versioned elements. This allows retractions to be propagated as just negative item counts which fits into the Flo model (and therefore Hydro).
If so, this seems somewhat problematic in terms of increased overhead.
How is fast communication achieved? Some fast shared memory IPC mechanism?
Also, I don't see anything about integration with async? For better or worse, the overwhelming majority of code dealing with networking has migrated to async. You won't find good non-async libraries for many things that need networking.
At POPL 2025 (last week!), an undergraduate working on Hydro presented a compiler that automatically compiles blocks of async-await code into Hydro dataflow. You can check out that (WIP, undocumented) compiler here: https://github.com/hydro-project/HydraulicLift
The latter benefits a lot from building on top of Apache Arrow and Apache Datafusion.