When standard machine code is written in a "Dependency cutting" way, then it scales to many different reorder registers. A system from 10+ years ago with only 100-reorder registers will execute the code with maximum parallelism... while a system today with 200 to 300-reorder buffers will execute the SAME code with also maximum parallelism (and reach higher instructions-per-clock tick).
That's why today's CPUs can have 4-way decoders and 6-way dispatch (AMD Zen and Skylake), because they can "pick up more latent parallelism" that the compilers have given them many years ago.
"Classic" VLIW limits your potential parallelism to the ~3-wide bundles (in Itanium's case). Whoever makes the "next" VLIW CPU should allow a similar scaling over the years.
-----------
It was accidental: I doubt that anyone actually planned the x86 instruction set to be so effectively instruction-level parallel. Its something that was discovered over the years, and proven to be effective.
Yes: somehow more parallel than the explicitly parallel VLIW architecture. Its a bit of a hack, but if it works, why change things?