The reverse is true.
SIMD is harder because you have to have a uniform operation across a set of data.
Imagine a for loop that looks like this
int[] x, y, z;
int[] p, d, q;
for (int i = 0; i < size; ++i) {
p[i] = x[i] / z[i]
d[i] = z[i] * x[i]
q[i] = y[i] + z[i]
}
For SIMD, this is a complicated mess for the compiler to unravel. What the compiler would LIKE to do is turn this into 3 for loops and use the SIMD instructions to perform those operations in parallel.
The itanium optimization, however, is a lot easier. The compiler can see that none of p, d, or q depend on the results of the previous stage (that is q[i] doesn't depend on p[i]). As a result, the entire thing can be packed into a single operation.
Now, of course, modern OOO processors can do the same optimization so maybe it's not a huge win? Still, would have been something worth exploring more (IMO) but the market forces killed it. Moving that sort of optimization out of the processor hardware and into the compiler software seems like it could lead to some nice power/performance benefits.