Most of the benchmarks that I have seen that were legitimately 10x the time of Java were using unadorned Clojure on Clojure data structures... of course it is slower, the machine is doing a lot more for you.
And it is well and good that this is the case, because the purpose of Clojure is not to make that tight loop really fast; it is to get the logic of your program correct and avoid a lot of the subtle bugs that can happen. Concurrency has many advantages above and beyond parallelism.
As far as getting along without macros and lazy evaluation and heterogeneous immutable data structures, I will say that for a long time we got along 'just fine' without garbage collection; but now it is a feature of a large subset of languages (to the extent that Google has now added it to C).