Before this, the idea was that RISC was simpler to implement and could be optimized more easily, ultimately be more cost effective. What wasn't factored in was how good Intel is at optimizing, and how hard they'd push their process, beating the RISC side despite all the disadvantages CISC had.
Now it's the GPU that's eating Intel's lunch, high performance floating point code on the CPU is several orders of magnitude slower than a high-end GPU, so Intel's trying to fight back with their "pile of CPUs" strategy (http://en.wikipedia.org/wiki/Larrabee_(microarchitecture)). It's not working out very well so far.
Historically, it has been my experience that pretty much all the non-x86 platforms the compiler and hardware specific optimizations tend to have a pretty dramatic impact. Intel just has so much code and existing code streams to factor in to their designs for new hardware. Maybe this has changed. It's a hard road if mismatched or non-hardware optimized binaries are slow and pokey and hardware specific optimized binaries are competitive. Come out with a great 64bit ARM core that can run nearly all ARM binaries with decent performance (clearly, excluding stuff that needs custom hardware..) and ARM could be pretty disruptive.