It would be interesting if current computers could take advantage of all expressions running in parallel; back last century that was exposing more parallelism (and incurring more coordination costs) than a better, coarser-grained, approach.
I guess that this being a declarative language, it is the task of the compiler to determine the cutoff where you just reorder tasks or you spawn a thread, right?