How Rust optimizes async/await (opens in new tab)

(tmandry.gitlab.io)

351 pointstmandry6y ago123 comments

123 comments

As a newcomer to Rust, wishing that this post was one of the first ones I've read about this topic. It took scouring through many many posts, some of them here on HN, to be able to grasp some of the same idea. (I may not be alone, judging from the very long discussion the other day: https://news.ycombinator.com/item?id=20719095)

hathawsh6y ago

Development of high quality async support in Rust is happening right now, so remember to wear a hard hat. ;-) I like to watch https://areweasyncyet.rs/ and https://this-week-in-rust.org/ to see where things are.

steveklabnik6y ago

Part of this is just that the design has been in the works for four years and has changed significantly during that time; it’s only now that things are almost stable that it’s worthwhile to write these kinds of things.

GolDDranks6y ago

Well, it's got still three months until it lands stable, maybe it's just that the time hasn't been ripe for great, understandable posts about the feature until recently.

zackmorris6y ago

This is one of the most concise tutorials on how generators, coroutines and futures/promises are related (from first principles) that I've seen.

I'm hopeful that eventually promises and async/await fade into history as a fad that turned out to be too unwieldy. I think that lightweight processes with no shared memory, connected by streams (the Erlang/Elixer, Go and Actor model) are the way to go. The advantage to using async/await seems to be to avoid the dynamic stack allocation of coroutines, which can probably be optimized away anyway. So I don't see a strong enough advantage in moving from blocking to nonblocking code. Or to rephrase, I don't see the advantage in moving from deterministic to nondeterministic code. I know that all of the edge cases in a promise chain can be handled, but I have yet to see it done well in deployed code. Which makes me think that it's probably untenable for the mainstream.

So I'd vote to add generators to the Rust spec in order to make coroutines possible, before I'd add futures/promises and async/await. But maybe they are all equivalent, so if we have one, we can make all of them, not sure.

tmandryOP6y ago

It's the same underlying mechanism for generators as for futures: they are stackless coroutines. All the space they need for local variables is allocated ahead of time.

In my experience, the fact that they are stackless is not at all obvious when you're coding with them. Rust makes working with them really simple and intuitive.

truncate6y ago

Debugging can be pain though, as you may not know the right stack, and makes it harder to follow how the code executed in that context. But yes, rather than writing an async state machine with callbacks, I would prefer this.

1 more reply

jnwatson6y ago

Regarding determinism, async is way more deterministic than multiple threads, because you don’t have arbitrary point where execution contexts can change.

zackmorris6y ago

That's true in a way, but only for multithreaded code. Multi-process code with full isolation uses different metaphors like joining threads within higher order functions to achieve parallelism in code that looks single-threaded.

For example, lisp-based languages like Clojure can be statically analyzed and then parallelized so that all noninteracting code runs in its own process. This can also be done for code that operates on vectors like MATLAB and TensorFlow.

For me, isolated processes under the Actor model in languages like Elixer/Erlang and Go is much simpler conceptually than async/await, which is only one step above promises/futures, which is only one step above callback hell. I know that the web world uses async/await for now, but someday I think that will be replaced with something that works more like Go.

adwn6y ago

> I think that lightweight processes with no shared memory, connected by streams [...] are the way to go.

No, at least not in general. There are a lot of problems in the real world for which "no shared memory" is incompatible with "efficient parallelism".

zackmorris6y ago

That's actually not true - the efficiencies due to copying can be overcome with the runtime or abstractions.

For example, the copy on write (COW) mechanism of unix where memory pages are mapped to the same location until a forked process writes to one, in which case the virtual memory manager makes a mutable copy.

There's also Redux and Closure's immutable state tree that makes copies under a similar mechanism to COW but through code, since they run at a level of abstraction above C++ or Rust.

My feeling is that these techniques run within a few percent of the speed of hand-optimization. But in the real world, I've seen very little human code remain optimized over the long term. Someone invariably comes along who doesn't understand the principles behind the code and inadvertently does a manual copy somewhere or breaks the O() speed of the algorithm by using the wrong abstractions. Meanwhile immutable code using mechanisms like COW avoids these pitfalls because the code is small and obvious.

I feel that the things that Rust is trying to do were solved long ago under FP, so I don't think it's the language for me. That's also why I moved away from C#, Java, C++, etc. Better languages might be Elixer or Clojure/ClojureScript, although they still have ugly syntax from a mainstream perspective compared to say Javascript or Python. I love that Rust exists as a formal spec of a mature imperative programming language. I think it's still useful in kernels and the embedded space. But I'm concerned that it's borrowing ideas like async/await that trade determinism for performance.

2 more replies

truncate6y ago

>> dynamic stack allocation of coroutines, which can probably be optimized away anyway

This seems interesting. Do you have any pointers to places/papers I can look more into this? I'm also curious, since the stacks have to be rather small when you are running several thousands of coroutines (like Go), how often people get into issues of running out of stack because of some big stack allocation somewhere and stuff like that.

zackmorris6y ago

I haven't studied it deeply, but a breadcrumb would be that cooperative threads (green threads) are equivalent to coroutines.

Ok it looks like current techniques are stackless runtimes and compiling coroutines to stackless continuations:

https://en.wikipedia.org/wiki/Stackless_Python

https://stackless.readthedocs.io/en/v3.6.4-slp/library/stack...

http://jessenoller.com/blog/2009/02/23/stackless-you-got-you...

https://engagedscholarship.csuohio.edu/cgi/viewcontent.cgi?r...

https://pdfs.semanticscholar.org/b9aa/49e4b7a00e6c9f0d8c18ba...

https://www.osnews.com/story/9822/protothreads-extremely-lig...

This looks like a rare gem, although I just started reading it:

https://cs.indiana.edu/~dfried/dfried/mex.pdf

I grew up with the cooperative multithreading of classic Mac OS and was really shocked when I first saw Javascript back in the 90s and it had no notion of it (because it didn't have generators). That sent us down the callback hell evolutionary dead end, through promises/futures and finally to async/await where we are now. That could have been largely avoided if we had listened to programming language experts!

1 more reply

pcwalton6y ago

As a reminder, you don't need to use async/await to implement socket servers in Rust. You can use threads, and they scale quite well. M:N scheduling was found to be slower than 1:1 scheduling on Linux, which is why Rust doesn't use that solution.

Async/await is a great feature for those who require performance beyond what threads (backed by either 1:1 or M:N implementations) can provide. One of the major reasons behind the superior performance of async/await futures relative to threads/goroutines/etc. is that async/await compiles to a state machine in the manner described in this post, so a stack is not needed.

woah6y ago

I find async/await easier to reason about than threads for anything more involved than the 1 request per thread web server use case. This is because you avoid bringing in the abstraction of threads (or green threads) and their communication with one another. You trade syntactical complexity (what color is your function, etc), for semantic complexity (threads, channels, thread safety, lock races).

Matthias2476y ago

I would agree to the statement for languages where it's an either/or. E.g. in Javascript there are only callbacks and async/await, so we can avoid all the complexity of threads - which is great!

However in multithreaded languages it's always an AND. Once you add async/await people need to know how traditional threading as well as how async/await works. Rust will also make that very explicit. Even if you use async/await on a single thread the compiler will prevent accessing data from more than one task - even it those are running on the same thread. So you need synchronization again. With maybe the upside that this could be non-thread-safe (or !Sync in Rust terms) in order to synchronize between tasks that run on the same thread. But however also with the downside that utilizing the wrong synchronization primitives (e.g. thread-blocking ones) in async code can stall also other tasks.

Overall I think using async/await in Rust is strictly more complex. But it has its advantages in terms of performance, and being able to cancel arbitrary async operations.

kccqzy6y ago

They have the same semantic complexity. You have tasks in async/await and you still need to deal with inter-task communication, locking, etc.

1 more reply

ori_b6y ago

Async/await are effectively threads, the switches are just scheduled statically.

jeffdavis6y ago

Async/await is also good for integration with other systems where starting new threads is not practical or you are calling non-threadsafe FFI functions. Tokio offers the CurrentThread runtime, which allows concurrency without creating any new threads.

dangxiaopin6y ago

To implement, no. To make a performant server- yes. Most systems have limited number of threads (low limit constant compiled with the kernel), and each thread is triggered by the scheduler, not by network events, which is very uneconomical.

You with application level concurrency you get 100x performance boost for network servers.

pcwalton6y ago

On extreme workloads, perhaps. But we have people happily running Rust code with thousands of threads per second in production.

M:N was experimentally found to be slower than 1:1 in Rust.

jnordwick6y ago

epoll io loop performs better for most network io though and is simplier to manage when you have to start dealing with out of band issues (like efficient hearbeats - every time I've had a conversation without how to move some of the heartbeat code over to async it comes down to just accepting it isn't going to be as efficient as my c++ implementation and either strain heavily of accept over publication).

Last time I saw there was still a couple extra allocations going on too in the compiler (I was told they were being worked on) and basically the default executor, tokio, wasn't very efficient at changing events in the queue (requiring a an extra self signal to get the job done).

I'd be interesting to see how little cost these are, because there is defintely a cost to the generator implementation. Yes, if I wrong a generator to do this, I couldn't write it better, but I wouldn't write a generator (and that would be a very odd definition of zero-cost there anything can be called zero cost even GC as long as it is implemented well - well, that depends on if rustc saves unnecessary state).

> Additionally, it should allow miri to validate the unsafe code inside our generators, checking for uses of unsafe that result in undefined behavior.

This is really useful as a lot of the buffer handing code needs to use unsafe for efficienty issues. And the enums sharing values is nice too - hopefully the extra pointer derferences can be optimized out.

I do worry though about all this state sitting on the head and ruining cache locality on function calls though.

pcwalton6y ago

If you want to use epoll directly in Rust, go right ahead. Nothing is stopping you. You don't have to switch to C++ to use epoll.

It's pretty clear that most people don't want to write and maintain that kind of code, though—that's what async/await is for. Personally, I hate writing state machines.

1 more reply

rapsey6y ago

Honestly I don’t get it why simply using mio (epoll, kqueue, iocp wrapper) is so unpopular.

2 more replies

carllerche6y ago

> tokio, wasn't very efficient at changing events in the queue (requiring a an extra self signal to get the job done).

Could you expand on that?

omeid26y ago

Curious what is the fundamental difference that makes Go do M:N thread efficiently? Considering that the compiler has far less information than Rust about the program.

pcwalton6y ago

Go is more efficient at M:N then Rust can be mostly for two reasons:

1. Go can start stacks small and relocate them, because the runtime needed to implement garbage collection allows relocation of pointers into the stack by rewriting them. Rust has no such runtime infrastructure: it cannot tell what stack values correspond to pointers and which correspond to other values. Additionally, Rust allows for pointers from the heap into the stack in certain circumstances, which Go does not (I don't think, anyway). So what Rust must do is to reserve a relatively large amount of address space for each thread's stack, because those stacks cannot move in memory. (Note that in the case of Rust the kernel does not have to, and typically does not, actually allocate that much physical memory for each thread until actually needed; the reservation is virtual address space only.) In contrast, Go can start thread stacks small, which makes them faster to allocate, and copy them around as they grow bigger. Note that async/await in Rust has the potential to be more efficient than even Go's stack growth, as the runtime can allocate all the needed space up front in some cases and avoid the copies; this is the consequence of async/await compiling to a static state machine instead of the dynamic control flow that threads have.

2. Rust cares more about fast interoperability with C code that may not be compiled with knowledge of async I/O and stack growth. Go chooses fast M:N threading over this, sacrificing fast FFI in the process as every cgo call needs to switch to a big stack. This is just a tradeoff. Given that 1:1 threading is quite fast and scalable on Linux, it's the right tradeoff for Rust's domain, as losing M:N threading isn't losing that much anyway.

newacctjhro6y ago

Go needs to allocate a growing stack on the heap, needs to move it around, etc. It's not as efficient as Rust's async.

1 more reply

staticassertion6y ago

I don't think the implication was that Go does it efficiently.

adaszko6y ago

> One of the major reasons behind the superior performance of async/await futures relative to threads/goroutines/etc. is that async/await compiles to a state machine in the manner described in this post, so a stack is not needed.

That's a great optimization but doesn't that mean it also breaks stack traces?

saurik6y ago

Does anyone know how Rust's implementation compares to C++2a's? The C++ people seem to have spent a lot of time creating an extremely generic framework for async/await wherein it is easy to change out how the scheduler works (I currently have a trivial work stack, but am going to be moving to something more akin to a deadline scheduler in the near future for a large codebase I am working with, which needs to be able to associate extra prioritization data into the task object, something that is reasonably simple to do with await_transform). I am also under the impression that existing implementation in LLVM already does some of these optimizations that Rust says they will get around to doing (as the C++ people also care a lot about zero-cost).

tmandryOP6y ago

Disclaimer: I'm not an expert on the proposal, but have looked at it some, and can offer my impressions here. (Sorry, this got a bit long!)

The C++ proposal definitely attacks the problem from a different angle than Rust. One somewhat surface-level difference is that it implements co_yield in terms of co_await, which is the opposite of Rust implementing await in terms of yield.

Another difference is that in Rust, all heap allocations of your generators/futures are explicit. In C++, technically every initialization of a sub-coroutine starts defaults to being a new heap allocation. I don't want to spread FUD: my understanding is that the vast majority of these are optimized out by the compiler. But one downside of this approach is that you could change your code and accidentally disable one of these optimizations.

In Rust, all the "state inlining" is explicitly done as part of the language. This means that in cases where you can't inline state, you must introduce an explicit indirection. (Imagine, say, a recursive generator - it's impossible to inline inside of itself! When you recurse, you must allocate the new generator on the heap, inside a Box.)

To be clear, the optimizations I'm talking about in the blog post are all implemented today. I'll be covering what they do and don't do, as well as future work needed, in future blog posts.

One benefit of C++ that you allude to is that there are a lot of extension points. I admit to not fully understanding what each one of them is for, but my feeling is that some of it comes from approaching the problem differently. Some of it absolutely represents missing features in Rust's initial implementation. But as I say in the post, we can and will add more features on a rolling basis.

The way I would approach the specific problem you mention is with a custom executor. When you write the executor, you control how new tasks are scheduled, and can add an API that allows specifying a task priority. You can also allow modifying this priority within the task: when you poll a task, set a thread-local variable to point to that task. Then inside the task, you can gain a reference to yourself and modify your priority.

je426y ago

> In C++, technically every initialization of a sub-coroutine starts defaults to being a new heap allocation. I don't want to spread FUD: my understanding is that the vast majority of these are optimized out by the compiler. But one downside of this approach is that you could change your code and accidentally disable one of these optimizations.

I don't think this is correct. C++ 20 allows a lot of choices to implement it without forcing a heap allocation.

see https://lewissbaker.github.io/2017/11/17/understanding-opera... also see this video that goes in depth how to have billions of coroutines with C++: https://www.youtube.com/watch?v=j9tlJAqMV7U

saurik6y ago

Thanks for the information!!

On your last paragraph, the thing I'm concerned by is where this extra priority information is stored and propogated, as the term "task" is interesting: isn't every single separate thing being awaited its own task? There isn't (in my mental model) a concept that maps into something like a "pseudo-thread" (but maybe Rust does something like this, requiring a very structured form of concurrency?), which would let me set a "pseudo-thread" property, right?

As an example: if I am already in an asynchronous coroutine and I spawn of two asynchronous web requests as sub-tasks, the results of which will be processed potentially in parallel on various work queues, and then join those two tasks into a high-level join task that I wait on (so I want both of these things to be done before I continue), I'd want the background processing done on the results to be handled at the priority of this parent spawning task; do I have to manually propagate this information?

In C++2a, I would model this by having a promise type that is used for my prioritize-able tasks and, to interface with existing APIs (such as the web request API) that are doing I/O scheduling; I'd use await_transform to adapt their promise type into one of mine that lets me maintain my deadline across the I/O operation and then recover it in both of the subtasks that come back into my multi-threaded work queue. Everything I've seen about Rust seems to assume that there is a single task/promise type that comes from the standard library, meaning that it isn't clear to me how I could possibly do this kind of advanced scheduling work.

(Essentially, whether or not it was named for this reason--and I'm kind of assuming it wasn't, which is sad, because not enough people understand monads and I feel like it hurts a lot of mainstream programming languages... I might even say particularly Rust, which could use more monadic concepts in its error handling--await_transform is acting as a limited form of monad transformer, allowing me to take different concepts of scheduled execution and merge them together in a way that is almost entirely transparent to the code spawning sub-tasks. The co_await syntax is then acting as a somewhat-lame-but-workable-I-guess substitute for do notation from Haskell. In a perfect world, of course, this would be almost as transparent as exceptions are, which are themselves another interesting form of monad.)

4 more replies

skybrian6y ago

It's interesting that support for recursion is no longer the default here. A partial reversal of what happened going from Fortran to Algol?

Rusky6y ago

Aside from the high-level similarity of the "function -> state machine" transformation, Rust's is quite a bit different (and IMO both simpler and more flexible).

A C++ coroutine chooses a particular promise type as part of its definition. Its frame defaults to a separate heap-allocation per coroutine, with some allowance for elision. At a suspension point, it passes a type-erased handle to the current coroutine to an `await_suspend` method, which can either `return false` or call `handle.resume()` to resume the coroutine. A stack of `co_await`ing coroutines (or "psuedo-thread" as you call it) is thus a linked list of `coroutine_handle`s stored in the coroutine frames of their await-ees, rooted with whoever is responsible for next resuming the coroutine.

A Rust async function does things inside out, in a sense. It has no promise type; calling one directly returns its frame into the caller's frame, as a value with an anonymous type that implements the `Future` trait. This trait has a single method called `poll`, which resumes the function and runs it until its next suspension point. `poll` takes a single argument, a handle which is used to signal when it is ready to continue. This handle is threaded down through a stack of `poll`s (a "task" or pseudo-thread), and stored with whoever is responsible for notifying the task it should continue.

One implication of the Rust approach is that the "executor" and the "reactor" are decoupled. An executor maintains a collection of running tasks and schedules them. A reactor holds onto those handles and notifies executors of relevant events. This lets you control scheduling without language hooks like await_transform- you can associate your prioritization data with a task when you spawn it on a particular executor, and it can await any reactor without losing that information.

Another implication is that you have a choice of whether to a) `await` a future, making it part of the current task, or b) spawn it as its own task, to be scheduled on its own, much like OS thread APIs. Option (a) can get really interesting with multiple concurrent sub-futures (with things like Promise.all or select); it can be as simple as having the caller poll all its children every time it wakes up, or as complex as wrapping `poll`'s handle argument and implementing your own scheduling within a task.

Matthias2476y ago

My understanding is that C++ opted more for a coroutine-first design, where a very generic coroutine abstraction is in the center of the design, and other things (like generators and async functions) are built around it. That makes it very universal - but probably also harder to understand if one only has a specific use-case.

Rusts design compared to that was not only focused on async functions as the main design goal, but also on maintaining compatibility with a "struct Future" type which also can be implemented by hand and represents a state-machine.

The latter will allow Rust to reuse lots of async infrastructure that had been built in the (combinators & state-machine) Futures world in the last years (e.g. the tokio library and everything on top of it).

One downside of Rusts approach might be that some parts feel a bit awkward and hard, e.g. the 2 different states of Futures (one where it hasn't been executed and can be moved around and one where it has been started executing and can't be moved anymore) and the pinning system. As far as I understand C++ exposes less of those details to end-users - this might be something where the implicit allocations might have helped it.

As far as I understand the async function flavor of C++ coroutines also have run-to-completion semantics and can't be cancelled at any yield point like Rusts Futures can be. This has the advantage of being able to wrap IO completion based operations in a more natural fashion than Rust. But it then again has the downside that users need to pass CancellationTokens around for cancellation, and that some code might not be cancellable.

continuational6y ago

I don't quite follow. What exactly is the overhead that other languages have for futures that is eliminated here?

weiming6y ago

Going down the same rabbit hole earlier this week, found this to be a good explanation:

All of the data needed by a task is contained within its future. That means we can neatly sidestep problems of dynamic stack growth and stack swapping, giving us truly lightweight tasks without any runtime system implications. ... Perhaps surprisingly, the future within a task compiles down to a state machine, so that every time the task wakes up to continue polling, it continues execution from the current state—working just like hand-rolled code. [1]

[1] https://aturon.github.io/blog/2016/09/07/futures-design/

davidw6y ago

> Perhaps surprisingly, the future within a task compiles down to a state machine, so that every time the task wakes up to continue polling, it continues execution from the current state

How are those tasks implemented, and what's scheduling them?

1 more reply

tmandryOP6y ago

This is a great quote, and one that I missed while first writing the post! I've added it now.

tmandryOP6y ago

Most languages allocate every future (and sub-future, and sub-sub-future) separately on the heap. This leads to some overhead, allocating and deallocating space to store our task state.

In Rust, you can "inline" an entire chain of futures into a single heap allocation.

pjmlp6y ago

In .NET something similar is possible via ValueTask.

1 more reply

dom966y ago

Has anyone ever done a comparison to see how much overhead this actually adds? I'd be really curious to see this represented in concrete terms.

zwkrt6y ago

So for example, in Kotlin each piece of synchronous code within an async function is compiled into what is essentially a Java Runnable object, which must be allocated on the heap.

Matthias2476y ago

As far as I understand in Kotlin continuations are only allocated on the heap when necessary (the suspending function can not be executed synchronously). Therefore most allocations should be avoided until blocking for IO.

continuational6y ago

Ah. What kind of asynchronous task executes so fast that a heap allocation is measurable?

2 more replies

emmanueloga_6y ago

123 comments

weiming6y ago

hathawsh6y ago

steveklabnik6y ago

GolDDranks6y ago

Well, it's got still three months until it lands stable, maybe it's just that the time hasn't been ripe for great, understandable posts about the feature until recently.

zackmorris6y ago

This is one of the most concise tutorials on how generators, coroutines and futures/promises are related (from first principles) that I've seen.

tmandryOP6y ago

It's the same underlying mechanism for generators as for futures: they are stackless coroutines. All the space they need for local variables is allocated ahead of time.

In my experience, the fact that they are stackless is not at all obvious when you're coding with them. Rust makes working with them really simple and intuitive.

truncate6y ago

1 more reply

jnwatson6y ago

Regarding determinism, async is way more deterministic than multiple threads, because you don’t have arbitrary point where execution contexts can change.

zackmorris6y ago

adwn6y ago

> I think that lightweight processes with no shared memory, connected by streams [...] are the way to go.

No, at least not in general. There are a lot of problems in the real world for which "no shared memory" is incompatible with "efficient parallelism".

zackmorris6y ago

That's actually not true - the efficiencies due to copying can be overcome with the runtime or abstractions.

There's also Redux and Closure's immutable state tree that makes copies under a similar mechanism to COW but through code, since they run at a level of abstraction above C++ or Rust.

2 more replies

truncate6y ago

>> dynamic stack allocation of coroutines, which can probably be optimized away anyway

zackmorris6y ago

I haven't studied it deeply, but a breadcrumb would be that cooperative threads (green threads) are equivalent to coroutines.

Ok it looks like current techniques are stackless runtimes and compiling coroutines to stackless continuations:

https://en.wikipedia.org/wiki/Stackless_Python

https://stackless.readthedocs.io/en/v3.6.4-slp/library/stack...

http://jessenoller.com/blog/2009/02/23/stackless-you-got-you...

https://engagedscholarship.csuohio.edu/cgi/viewcontent.cgi?r...

https://pdfs.semanticscholar.org/b9aa/49e4b7a00e6c9f0d8c18ba...

https://www.osnews.com/story/9822/protothreads-extremely-lig...

This looks like a rare gem, although I just started reading it:

https://cs.indiana.edu/~dfried/dfried/mex.pdf

1 more reply

pcwalton6y ago

woah6y ago

Matthias2476y ago

I would agree to the statement for languages where it's an either/or. E.g. in Javascript there are only callbacks and async/await, so we can avoid all the complexity of threads - which is great!

Overall I think using async/await in Rust is strictly more complex. But it has its advantages in terms of performance, and being able to cancel arbitrary async operations.

kccqzy6y ago

They have the same semantic complexity. You have tasks in async/await and you still need to deal with inter-task communication, locking, etc.

1 more reply

ori_b6y ago

Async/await are effectively threads, the switches are just scheduled statically.

jeffdavis6y ago

dangxiaopin6y ago

You with application level concurrency you get 100x performance boost for network servers.

pcwalton6y ago

On extreme workloads, perhaps. But we have people happily running Rust code with thousands of threads per second in production.

M:N was experimentally found to be slower than 1:1 in Rust.

jnordwick6y ago

> Additionally, it should allow miri to validate the unsafe code inside our generators, checking for uses of unsafe that result in undefined behavior.

I do worry though about all this state sitting on the head and ruining cache locality on function calls though.

pcwalton6y ago

If you want to use epoll directly in Rust, go right ahead. Nothing is stopping you. You don't have to switch to C++ to use epoll.

It's pretty clear that most people don't want to write and maintain that kind of code, though—that's what async/await is for. Personally, I hate writing state machines.

1 more reply

rapsey6y ago

Honestly I don’t get it why simply using mio (epoll, kqueue, iocp wrapper) is so unpopular.

2 more replies

carllerche6y ago

> tokio, wasn't very efficient at changing events in the queue (requiring a an extra self signal to get the job done).

Could you expand on that?

omeid26y ago

Curious what is the fundamental difference that makes Go do M:N thread efficiently? Considering that the compiler has far less information than Rust about the program.

pcwalton6y ago

Go is more efficient at M:N then Rust can be mostly for two reasons:

newacctjhro6y ago

Go needs to allocate a growing stack on the heap, needs to move it around, etc. It's not as efficient as Rust's async.

1 more reply

staticassertion6y ago

I don't think the implication was that Go does it efficiently.

adaszko6y ago

That's a great optimization but doesn't that mean it also breaks stack traces?

saurik6y ago

tmandryOP6y ago

Disclaimer: I'm not an expert on the proposal, but have looked at it some, and can offer my impressions here. (Sorry, this got a bit long!)

To be clear, the optimizations I'm talking about in the blog post are all implemented today. I'll be covering what they do and don't do, as well as future work needed, in future blog posts.

je426y ago

I don't think this is correct. C++ 20 allows a lot of choices to implement it without forcing a heap allocation.

see https://lewissbaker.github.io/2017/11/17/understanding-opera... also see this video that goes in depth how to have billions of coroutines with C++: https://www.youtube.com/watch?v=j9tlJAqMV7U

saurik6y ago

Thanks for the information!!

4 more replies

skybrian6y ago

It's interesting that support for recursion is no longer the default here. A partial reversal of what happened going from Fortran to Algol?

Rusky6y ago

Aside from the high-level similarity of the "function -> state machine" transformation, Rust's is quite a bit different (and IMO both simpler and more flexible).

Matthias2476y ago

continuational6y ago

I don't quite follow. What exactly is the overhead that other languages have for futures that is eliminated here?

weiming6y ago

Going down the same rabbit hole earlier this week, found this to be a good explanation:

[1] https://aturon.github.io/blog/2016/09/07/futures-design/

davidw6y ago

> Perhaps surprisingly, the future within a task compiles down to a state machine, so that every time the task wakes up to continue polling, it continues execution from the current state

How are those tasks implemented, and what's scheduling them?

1 more reply

tmandryOP6y ago

This is a great quote, and one that I missed while first writing the post! I've added it now.

tmandryOP6y ago

Most languages allocate every future (and sub-future, and sub-sub-future) separately on the heap. This leads to some overhead, allocating and deallocating space to store our task state.

In Rust, you can "inline" an entire chain of futures into a single heap allocation.

pjmlp6y ago

In .NET something similar is possible via ValueTask.

1 more reply

dom966y ago

Has anyone ever done a comparison to see how much overhead this actually adds? I'd be really curious to see this represented in concrete terms.

zwkrt6y ago

So for example, in Kotlin each piece of synchronous code within an async function is compiled into what is essentially a Java Runnable object, which must be allocated on the heap.

Matthias2476y ago

continuational6y ago

Ah. What kind of asynchronous task executes so fast that a heap allocation is measurable?

2 more replies

emmanueloga_6y ago