I spend most of my time writing services, and when looking at a language like Go, there's a reason the default is pass-by-value. Passing around pointers to multiple co-routines is asking for a race condition, and in non-GC languages, null pointers. Services are rarely written with the precision of high-performance servers.
I don't envy the server writers (although I can see how it would be fun!). Giving up a few milliseconds per request to make sure my co-routines aren't sharing pointers is worthwhile, and I appreciate the safety that gives me. I'm sure someone will mention that Rust could give me the safety I was looking for in a non-GC language, but that's the point, isn't it - that by being able to game the system, you can gain a few precious microseconds here and there that enforced safety might cost you.
> I'm sure someone will mention that Rust could give me
> the safety I was looking for in a non-GC language, but
> that's the point, isn't it
I'm not sure what you're trying to say here. There is no need to game the system, as Rust lets you pass pointers between threads while enforcing that they are used safely. Rust lets you be as fast as an unsafe language while as safe as a managed language; the tradeoff is that you have to give more thought to your design up-front (which, given the pain of trying to reproduce and fix race conditions, is more than worth the effort IMO).>> Alternatively, we could just yield the coro, waiting for a new special coroutine that would run in a background thread just executing readahead() and then re-scheduling(making runnable) the original coro, waiting for readahead.
Seems to me this scheme will ultimately be limited by the slower request. Performing fast and slow operations is essentially Little's Law[1] where the average time dominates. However if the slow/blocking reads where also async, I think you'd eventually be limited by io speed?
The main goals of this system is to minimize potential for blocking or otherwise slow processing steps which can stall processing of other requests, especially if this happens for many such requests/second.
Specifically, the 'optimistic sequential execution' idea is that you run the request on the request it was accepted in, use mincore() to determine if will _likely_ stall if you read from it, and if so, either migrate the coro to a 'slow' coros threads list, or just keep on going (which should be the fast-path here).
This is a solution to a specific class of problems. The similarities with Erlang's Process scheduling is that a process is a lot like a coroutine (it encapsulates logic and data and it's scheduled by a user-space scheduler, and that there can be thousands of them in runnable state, and each has its own stack). But that's really the definition of stackless coroutines.
Even if a request is completely CPU-bound, it’s a good idea to be fair to other requests accepted in the thread; that is, if a request should take 2 seconds of processing time, while others would take a few microseconds, they shouldn’t all wait until that one request is processed.
Instead, yielding back to the scheduler where appropriate, will give them a chance to complete in time proportional to the effort required to process them, not to the time required to handle long-running requests that stall processing of other coros.