[0] https://web.archive.org/web/20260105235513/https://www.chiar...
Why are people afraid of state machines? There's been sooo much effort spent on hiding them from the programmer...
For example, generators. Also known as semicoroutines.
https://langdev.stackexchange.com/a/834
This:
generator fib() {
a, b = 1, 2
while (a<100) {
b, a = a, a+b
yield a
}
yield a-1
}
Becomes this: struct fibState {
a,
b,
position
}
int fib(fibState state) {
switch (fibState.postion) {
case 0:
fibState.a, fibState.b = 1,2
while (a<100) {
fibState.b, fibState.a = fibState.a, fibState.a+fibState.b
// switching the context
fibState.position = 1;
return fibState.a;
case 1:
}
fibState.position = 2;
return fibState.a-1
case 2:
fibState.position = -1;
}
}
The ugly state machine example presented in the article is also a manual implementation of a generator. It's as palatable to the normal programmer as raw compiler output. Being written in C++ makes it even uglier and more complicated.The programming language I made is a concrete example of what programming these things manually is like. I had to write every primitive as a state machine just like the one above.
https://www.matheusmoreira.com/articles/delimited-continuati...
1. C++20 coros are stackless, in the general case every async "function call" heap allocates.
2. If you do your own stackful coroutines, every function can suspend/resume, you don't have to deal with colored functions.
3. (opinion) C++20 coros are very tasteless and "C++-design-commitee pilled". They're very hard to understand, implement, require the STL, they're very heavy in debug builds and you'll end up with template hell to do something as simple as Promise.all
I'm not normally keen to "well actually" people with the C standard, but .. if you're writing in assembly, you're not writing in C. And the obvious consequence is that it stops being portable. Minicoro only supports three architectures. Granted, those are the three most popular ones, but other architectures exist.
(just double checked and it doesn't do Windows/ARM, for example. Not that I'm expecting Microsoft to ship full conformance for C++23 any time soon, but they have at least some of it)
That's the problem with register machines, I guess. Interestingly enough, BCPL, its main implementation being a p-code interpreter of sorts, has pretty trivially supported coroutines in its "standard" library since the late seventies — as you say, all you need to save is the current stack pointer and the code pointer.
> require the STL
That it has to heap-allocate if non-inlined is a misconception. This is only the default behavior.
One can define:
void *operator new(size_t sz, Foo &foo)
in the coro's promise type, and this:
- removes the implicitly-defined operator new
- forces the coro's signature to be CoroType f(Foo &foo), and forwards arguments to the "operator new" one defined
Therefore, it's pretty trivial to support coroutines even when heap cannot be used, especially in the non-recursive case.
Yes, green threads ("stackful coroutines") are more straightforward to use, however:
- they can't be arbitrarily destroyed when suspended (this would require stack unwinding support and/or active support from the green thread runtime)
- they are very ABI dependent. Among the "few registers" one has to save FPU registers. Which, in the case of older Arm architectures, and codegen options similar to -mgeneral-regs-only (for code that runs "below" userspace). Said FPU registers also take a lot of space in the stack frame, too
Really, stackless coros are just FSM generators (which is obvious if one looks at disasm)
That was over 20 years ago. No idea what the current hotness is.
The stack save/restore happens in: https://swtch.com/libtask/asm.S
I recall working on a few VR projects - where it's imperative that you keep that framerate solid or risk making the user physically sick - this is where really began using coroutines for instantiating large volumes of objects and so on (and avoiding framerate stutter).
ECS/Dots & the burst compiler makes all of this unnecessary and the performance is nothing short of incredible.
Why? You can just as well execute all your coroutines on a single thread. Many networking applications are doing fine with just use a single ASIO thread.
Another example: you could write game behavior in C++ coroutines and schedule them on the thread that handles the game logic. If you want to wait for N seconds inside the coroutine, just yield it as a number. When the scheduler resumes a coroutine, it receives the delta time and then reschedules the coroutine accordingly. This is also a common technique in music programming languages to implement musical sequencing (e.g. SuperCollider)
In a Unity context, the engine provides the main loop and the developer is writing behaviors for game entities.
You can call a function that makes use of coroutines without worrying about it. That's the core intent of the design.
That is, if you currently use some blocking socket library, we could replace the implementation of that with coroutine based sockets, and everything should still work without other code changes.
Multithreaded? Nope. You can do C++ coroutines just fine in a single-threaded context.
Event loop? Only if you're wanting to do IO in your coroutines and not block other coroutines while waiting for that IO to finish.
> most people end up using coroutines with something like boost::asio
Sure. But you don't have to. Asio is available without the kitchen sink: https://think-async.com/Asio/
Coroutines are actually really approachable. You don't need boost::asio, but it certainly makes it a lot easier.
I recommend watching Daniela Engert's 2022 presentation, Contemporary C++ in Action: https://www.youtube.com/watch?v=yUIFdL3D0Vk
That’s similar to most of what makes C++ tick: There’s no deep magic, it’s “just” type-checked syntactic sugar for code patterns you could already implement in C.
(Occurs to me that the exceptions to this … like exceptions, overloads, and context-dependent lookup … are where C++ has struggled to manage its own complexity.)
It can easily and often does lead to messy rube goldberg machines.
There was a game AI talk a while back, I forget the name unfortunately, but as I recall the guy was pointing out this friction and suggesting additions we could make at the programming language level to better support that kind of time spanning logic.
This what Rich Hickey (Clojure author) has termed “place oriented programming”, when the focus is mutating memory addresses and having to synchronize everything, but failing to model time as a first class concept.
I’m not aware of any general purpose programming language that successfully models time explicitly, Verilog might be the closest to that.
Sounds interesting. If it's not too much of an effort, could you dig up a reference?
https://discussions.unity.com/t/coreclr-scripting-and-ecs-st...
Is that a hack? Is that not just exactly what IEnumerable and IEnumerator were built to do?
Edit: Nevermind, they eventually bothered.
I would just go straight to tbb and concurrent_unordered_map!
The challenge of parallelism does not come from how to make things parallel, but how you share memory:
How you avoid cache misses, make sure threads don't trample each other and design the higher level abstraction so that all layers can benefit from the performance without suffering turnaround problems.
My challenge right now is how do I make the JVM fast on native memory:
1) Rewrite my own JVM. 2) Use the buffer and offset structure Oracle still has but has deprecated and is encouraging people to not use.
We need Java/C# (already has it but is terrible to write native/VM code for?) with bottlenecks at native performance and one way or the other somebody is going to have to write it?
What do you mean here? Do you mean hand-writing MSIL or native interop (pinvoke) or something else?
Your stack is on the heap and it contains an instruction pointer to jump to for resume.
This is quite understandable when you know the history behind how C++ coroutines came to be.
They were initially proposed by Microsoft, based on a C++/CX extension, that was inspired by .NET async/await implementation, as the WinRT runtime was designed to only support asynchronous code.
Thus if one knows how the .NET compiler and runtime magic works, including custom awaitable types, there will be some common bridges to how C++ co-routines ended up looking like.
I never understood the value. Just use lambdas/callbacks.
"Just" is doing a lot of work there. I've use callback-based async frameworks in C++ in the past, and it turns into pure hell very fast. Async programming is, basically, state machines all the way down, and doing it explicitly is not nice. And trying to debug the damn thing is a miserable experience
Lol, no thanks. People are using coroutines exactly to avoid callback hell. I have rewritten my own C++ ASIO networking code from callback to coroutines (asio::awaitable) and the difference is night and day!
waitFrames(5); // wait 5 frames
fireProjectile();
waitFrames(15);
turnLeft(-30/*deg*/, 120); // turn left over 120 frames
waitFrames(10);
fireProjectile();
// spin and shoot
for (i of range(0, 360, 60)) {
turnRight(60, 90); // turn 60 degrees over 90 frames
fireProjectile();
}
10 lines and I get behavior over time. What would your non-coroutine solution look like?For simple callback hell, not so much.
Just put your state in visible instance variables of your objects, and then you will actually be able to see and even edit what state your program is in. Stop doing things that make debugging difficult and frustratingly opaque.
Appreciate this humor -- absurd, tasteful.
The "ugly" version with the switch seems much preferable to me. It's simple, works, has way less moving parts and does not require complex machinery to be built into the language. I'm open to being convinced otherwise but as it stands I'm not seeing any horrible problems with it.
Unity's own documentation for changing scenes uses coroutines