Actually the overhead crushes you when you employ 10000 cores. If the overhead of a process is 10% and the parallel part is 90%, then 2 cores will result in a run time of 55% = 10% + 90%/2 of the original time. And 10 cores will get you to 19%. And 100 cores to 10.9%. If you then buy 9900 more cores to bring it to a total of 10000, you just reduced the runtime from 10.9% to 10.009%. In other words, you increased your bill by a factor of 100 to reduce your run time by almost nothing.