Building a modern durable execution engine from first principles (opens in new tab)

(restate.dev)

98 pointswhoiskatrin0y ago26 comments

26 comments

dang0y ago

One of the authors worked on Apache Flink but is too modest to include that interesting detail! So I'm adding it here. Hopefully he won't mind.

sewen0y ago

All of the Restate co-founders com from various stages of Apache Flink.

Restate is in many ways a mirror image to Flink. Both are event-streaming architectures, but otherwise make a lot of contrary design choices.

(This is not really helpful to understand what Restate does for you, but it is an interesting tid bit about the design.)

       Flink     |   Restate
  -------------------------------
                 |
    analytics    |  transactions
                 |
  coarse-grained |  fine-grained
    snapshots    | quorum replication
                 |
   throughput-   |  latency-sensitive
    optimized    |  
                 |
  app and Flink- |  disaggregated code
  share process  |   and framework
                 |
      Java       |      Rust

the list goes on...

yubblegum0y ago

They mention it in their about page: https://restate.dev/about

dang0y ago

Thanks—sounds like it was more than one of them!

jedberg0y ago

Hi CEO of DBOS here (we’re a restate competitor).

I really enjoyed this blog post. The depth of the technical explanations is appreciated. It really helps me understand the choices you’ve made and why.

We’ve obviously made some different design choices, but each solution has its place.

digdugdirk0y ago

DBOS was the first thing I thought of when I saw this.

From a DBOS perspective, can you explain what the differences are between the two?

KraftyOne0y ago

(DBOS co-founder here) DBOS embeds durable execution into your app as a library backed by Postgres, whereas Restate provides durable execution as a service.

In my opinion, this makes DBOS more lightweight and easier to integrate into an existing application. To use DBOS, you just install the library and annotate workflows and steps in your program. DBOS will checkpoint your workflows and steps in Postgres to make them durable and recover them from failures, but will otherwise leave your application alone. By contrast, to use Restate, you need to split out your durable code into a separate worker service and use the Restate server to dispatch events to it. You're essentially outsourcing control flow and event processing to the Restate server., which will require some rearchitecting.

Here's a blog post with more detail comparing DBOS and Temporal, whose model is similar to (but not the same as!) Restate: https://www.dbos.dev/blog/durable-execution-coding-compariso...

randomcatuser0y ago

I have to say, the examples are really good: https://github.com/restatedev/examples

For someone (me) who is new to distributed systems, and what durable execution even is, I learned a lot (both from the blog post & the examples!). thanks!

ChatGPT also helped :)

oulipo0y ago

Interesting, how does it compare to Inngest and DBOS?

p10jkle0y ago

Hey, I work on Restate. There are lots of differences throughout the architecture and the developer experience, but the one most relevant to this article is that Restate is itself a self-contained distributed stream-processing engine, which it uses to offer extremely low latency durable execution with strong consistency across AZs/regions. Other products tend to layer on top of other stores, which will inherit the good things and the bad things about those stores when it comes to throughput/latency/multi-region/consistency.

We are putting a lot of work into high throughput, low latency, distributed use cases, hence some of the decisions in this article. We felt that this necessitated a new database.

ALLTaken0y ago

Hi,

I'm building a distributed application based on Hypergraphs, because the data being processed is mostly re-executable in different ways.

It's so refreshing to read this, I was also sitting down many nights and was thinking up about the same problem that you guys solved. I'm so glad about this!

Would it be possible to plug other storage engines into Restate? The data-structure that needs to be persisted allows multiple-path execution and instant re-ordering without indexing requirements.

I'm mostly programming in Julia and would love to see some little support for it too =)

Great work guys!

1 more reply

bluelightning2k0y ago

I find this type of thing very interesting technically, but not very interesting commercially.

It would seem to me that durable execution implies long running jobs, but this kind of work suggests micro optimisation of a couple of ms. The applications inherently don't care about this stuff?

What am I missing. Or is it just that at a big enough scale anything matters.

2 more replies

popalchemist0y ago

How does it compare against Trigger or Hatchet?

1 more reply

agentultra0y ago

Isn’t this just event sourcing? Why not re-use the terminology from there?

solatic0y ago

Can you elaborate more on your persistence layer?

One of the good reasons why most products will layer on top of an established database like Postgres is because concerns like ACID are solved problems in those databases, and the database itself is well battle-hardened, Jepsen-tested with a reputation for reliability, etc. One of the reasons why many new database startups fail is precisely because it is so difficult to get over that hump with potential customers - you can't really improve reliability until you run into production bugs, and you can't sell because it's not reliable. It's a tough chicken-and-egg problem.

I appreciate you have reasons to build your own persistence layer here (like a push-based model), but doesn't doing so expose you to the same kind of risk as a new database startup? Particularly when we're talking about a database for durable execution, for which, you know, durability is a hard requirement?

sewen0y ago

Indeed, the persistence layer is sensitive, and we do take this pretty serious.

All data is persisted via RocksDB. Not only the materialized state of invocations and journals, but even the log itself uses RocksDB as the storage layer for sequence of events. We do that to benefit from the insane testing and hardening that Meta has done (they run millions of instances). We are currently even trying to understand which operations and code paths Meta uses most, to adopt the code to use those, to get the best-tested paths possible.

The more sensitive part would be the consensus log, which only comes into play if you run in distributed deployments. In a way, that puts us into a similar boat as companies like Neon: having reliably single-node storage engine, but having to build the replication and failover around that. But in that is also the value-add over most databases.

We do actually use Jepsen internally for a lot of testing.

(Side note: Jepsen might be one of the most valuable things that this industry has - the value it adds cannot be overstated)

sewen0y ago

I just realized I missed an important part: The primary durability for the bulk of the state comes from S3 (or similar object store). The periodic snapshots give you like an automatic frequent backup mechanism for free, which in itself is a nice property to have.

xwowsersx0y ago

Looks very interesting. How does it compare to Temporal?

sewen0y ago

There are a few dimensions where this is different.

(1) The design is a fully self-contained stack, event-driven, with its own replicated log and embedded storage engine.

That lets it ship as a single binary that you can use without dependency (on your laptop or the cloud). It is really easy to run.

It also scales out by starting more nodes. Every layer scales hand-in hand, from log to processors. (you should give it an object store to offload data, when running distributed)

The goal is a really simple and lightweight way to run yourself, while incrementally scaling to very large setups when necessary. I think that is non-trivial to do with most other systems.

(2) Restate pushes events, compared to Temporal pulling activities. This is to some extent a matter of taste, though the push model has a way to work very naturally with serverless functions (lambda, CF workers, fly.io, ...).

(3) Restate models services and stateful functions, not workflows. This means you can model logic that keeps state for longer than what would be the scope of a workflow (you have like a K/V store transactionally integrated with durable executions). It also supports RPC and messaging between functions (exactly-once integrated with the durable execution).

(4) The event-driven runtime, together with the push model, gets fairly good latencies (low overhead of durable execution).

7bit0y ago

What do you mean by pushes events and pulling activities? Where exactly does that take place during a durable execution? I used Temporal and I know what Temporal Activities are, but the pushing and pulling confuses me.

1 more reply

j / k navigate · click thread line to collapse

26 comments

dang0y ago

One of the authors worked on Apache Flink but is too modest to include that interesting detail! So I'm adding it here. Hopefully he won't mind.

sewen0y ago

All of the Restate co-founders com from various stages of Apache Flink.

Restate is in many ways a mirror image to Flink. Both are event-streaming architectures, but otherwise make a lot of contrary design choices.

(This is not really helpful to understand what Restate does for you, but it is an interesting tid bit about the design.)

       Flink     |   Restate
  -------------------------------
                 |
    analytics    |  transactions
                 |
  coarse-grained |  fine-grained
    snapshots    | quorum replication
                 |
   throughput-   |  latency-sensitive
    optimized    |  
                 |
  app and Flink- |  disaggregated code
  share process  |   and framework
                 |
      Java       |      Rust

the list goes on...

yubblegum0y ago

They mention it in their about page: https://restate.dev/about

dang0y ago

Thanks—sounds like it was more than one of them!

jedberg0y ago

Hi CEO of DBOS here (we’re a restate competitor).

I really enjoyed this blog post. The depth of the technical explanations is appreciated. It really helps me understand the choices you’ve made and why.

We’ve obviously made some different design choices, but each solution has its place.

digdugdirk0y ago

DBOS was the first thing I thought of when I saw this.

From a DBOS perspective, can you explain what the differences are between the two?

KraftyOne0y ago

(DBOS co-founder here) DBOS embeds durable execution into your app as a library backed by Postgres, whereas Restate provides durable execution as a service.

Here's a blog post with more detail comparing DBOS and Temporal, whose model is similar to (but not the same as!) Restate: https://www.dbos.dev/blog/durable-execution-coding-compariso...

randomcatuser0y ago

I have to say, the examples are really good: https://github.com/restatedev/examples

For someone (me) who is new to distributed systems, and what durable execution even is, I learned a lot (both from the blog post & the examples!). thanks!

ChatGPT also helped :)

oulipo0y ago

Interesting, how does it compare to Inngest and DBOS?

p10jkle0y ago

We are putting a lot of work into high throughput, low latency, distributed use cases, hence some of the decisions in this article. We felt that this necessitated a new database.

ALLTaken0y ago

Hi,

I'm building a distributed application based on Hypergraphs, because the data being processed is mostly re-executable in different ways.

It's so refreshing to read this, I was also sitting down many nights and was thinking up about the same problem that you guys solved. I'm so glad about this!

Would it be possible to plug other storage engines into Restate? The data-structure that needs to be persisted allows multiple-path execution and instant re-ordering without indexing requirements.

I'm mostly programming in Julia and would love to see some little support for it too =)

Great work guys!

1 more reply

bluelightning2k0y ago

I find this type of thing very interesting technically, but not very interesting commercially.

It would seem to me that durable execution implies long running jobs, but this kind of work suggests micro optimisation of a couple of ms. The applications inherently don't care about this stuff?

What am I missing. Or is it just that at a big enough scale anything matters.

2 more replies

popalchemist0y ago

How does it compare against Trigger or Hatchet?

1 more reply

agentultra0y ago

Isn’t this just event sourcing? Why not re-use the terminology from there?

solatic0y ago

Can you elaborate more on your persistence layer?

sewen0y ago

Indeed, the persistence layer is sensitive, and we do take this pretty serious.

We do actually use Jepsen internally for a lot of testing.

(Side note: Jepsen might be one of the most valuable things that this industry has - the value it adds cannot be overstated)

sewen0y ago

xwowsersx0y ago

Looks very interesting. How does it compare to Temporal?

sewen0y ago

There are a few dimensions where this is different.

(1) The design is a fully self-contained stack, event-driven, with its own replicated log and embedded storage engine.

That lets it ship as a single binary that you can use without dependency (on your laptop or the cloud). It is really easy to run.

It also scales out by starting more nodes. Every layer scales hand-in hand, from log to processors. (you should give it an object store to offload data, when running distributed)

The goal is a really simple and lightweight way to run yourself, while incrementally scaling to very large setups when necessary. I think that is non-trivial to do with most other systems.

(4) The event-driven runtime, together with the push model, gets fairly good latencies (low overhead of durable execution).

7bit0y ago

1 more reply

j / k navigate · click thread line to collapse