Think of a data-shuffling pipeline that also updates statistics (in the background): running the next stage of the pipeline is more likely to get rid of lots of memory chunks than is running the statistics code. It also improves pipeline latency, of course. The scheduling definitely matters even for cooperative threading.