To work steal, you must have work to steal. If you always have work to steal, you have a CPU problem, not a CPU fabric problem. CPU fabrics are good for when you have some sort of task that is
sort of parallel, but also somehow requires a lot of cross-talk between the tasks, preferably of a very regular and predictable nature, e.g., not randomly blasting messages of very irregular sizes like one might see in a web-based system, but a very regular "I'm going to need exactly 16KB per frame from each of my surrounding 4 CPUs every 25ms". You would think of using a GPU on a modern computer because you can use all the little CPUs in a GPU, but the GPU won't do well because those GPU CPUs can't communicate like that. GPUs obtain their power by forbidding communication within cells except through very stereotyped patterns.
If you have all that, and you have it all the time, you can win on these fabrics.
The problem is, this doesn't describe very many problems. There's a lot of problems that may sort of look like this, but have steps where the problem has to be unpacked and dispatched, or the information has to be rejoined, or just in general there's other parts of the process that are limited to a single CPU somehow, and then Amdahl's Law murders your performance advantage over conventional CPUs. If you can't keep these things firing on all cylinders basically all the time, you very quickly end up back in a regime where conventional CPUs are more appropriate. It's really hard to feed a hundred threads of anything in a rigidly consistent way, whereas "tasks more or less randomly pile up and we dispatch our CPUs to those tasks with a scheduler" is fairly easy, and very useful.