Your point seems invalid, in the face of a large chunk of HPC (neural nets, matrix multiplication, etc. etc.) getting rewritten to support CUDA, which didn't even exist back when Itanium was announced.
VLIW is a compromise product: its more parallel than a traditional CPU, but less parallel than SIMD/GPUs.
And modern CPUs have incredibly powerful SIMD engines: AVX2 and AVX512 are extremely fast and parallel. There are compilers that auto-vectorize code, as well as dedicated languages (such as ipsc) which work for SIMD.
Encoders, decoders, raytracers, and more have been rewritten for Intel AVX2 SIMD instructions, and then re-rewritten for GPUs. The will to find faster execution has always existed, but unfortunately, Itanium failed to perform as well as its competition.