It was always C++ for some type of high-performance data processing engine. Around half the stackful coroutine implementations were off-the-shelf libraries (e.g. Boost::Context) and the other half were purpose-built from scratch, depending on the feature requirements. The typical model is that you have stackful coroutines at a coarse level, e.g. per database query, which may dispatch hundreds of concurrent state machines. All execution and I/O scheduling is explicitly done by the software, which enables some significant runtime optimizations.
If coroutines can be preempted then it introduces a requirement for concurrency control that otherwise doesn't need to exist and interferes with dynamic cache locality optimizations. These are some of the primary benefits of using stackful coroutines in this context.
Being able to interrupt a stackful coroutine has utility for dealing with an extremely slow or stuck thread but you want this to be zero-overhead unless the thread is actually stuck. In most system designs, the time required to traverse any pair of sequential yield points is well-bounded so things getting "stuck" is usually a bug.
Letting end-users inject arbitrary code into these paths at runtime does require the ability to interrupt the thread but even that is often handled explicitly by more nuanced means than random preemption. Sometimes "extremely slow" is correct and expected behavior, so you have to schedule around it.