seq 1000 | xargs echo | wc -l
1
seq 1000 | parallel echo | wc -l
1000
So xargs has only run echo once, whereas parallel has run it 1000 times.If we force xargs to run only one argument at once, things look better:
time ( seq 1000 | xargs -n 1 echo | wc -l )
1000
real 0m1.478s
time ( seq 1000 | parallel echo | wc -l )
1000
real 0m5.536s
Although, still not great for parallelHowever, we still haven't taken advantage of parallelisation. This is where the real strength of parallel comes in, and where (I suspect) a bunch of the slow down comes in.
If we run xargs in parallel (with --max-procs=4), then we get much faster real time, but the output is randomly shuffled up (as xargs just lets each process output when it wants). If we had programs with multiline output, they would be all shuffled together.
On the other hand, when we parallelise with parallel, the output is still all nicely sorted in order, as parallel stores up the output of each program, and outputs them in the correct order. This does create some overhead, but means the output is much more readable. In your example, if you parallelised your xargs with --max-procs, you would find the greps of different files mixed together, but not with parallel.
Wow, wrote more than I intended. Basically there is a difference, but it isn't quite as much as you think due to differences in default behaviour. However, parallel also does more stuff once you start parallelising!