> All the experts I listened or read to, they told that instruction set doesn't matter and it is the insignificant thing. The part that matters is branch and data prediction, and caching. Also, even intel transforms an instruction into RISC like microinstructions internally.
That's commonly repeated, but is a misunderstanding. Up until this point the difference was mostly that an x86 decoder took up more chip area, which given Intel's historical leads in process tech was no big deal to them.
However now we're pushing chips to go wider than ever. Intel and AMD haven't been able to push past a 4x superscalar decoder. The instruction set just has too many potential chained dependencies to make it work. You'd have to slow cycle time or introduce additional pipeline stages such that performance in the net is worse. Meanwhile M1 decodes at 8x.
This dovetails into what you're saying about stalls caused by prediction and caching. Once the stall is resolved M1 can race ahead, assigning work into the shadow registers at potentially twice the peak rate.
You're being a bit hyperbolic about Windows backwards compatibility. Much of the enterprise software world is still running programs that were written against windows XP just fine, and MS is not going to rock that boat any time soon.
The big difference with Apple's transition is precisely due to the translation (note not emulation). I've lived through 3 of their ISA changes now and they've all been nearly seamless. The big difference is Mac users have been ok with sunsetting the old apps ~5 years after the transition, something that's a total nonstarter in Windows land.
Rosetta2 is so stinking fast I have not even had to think one whip about what's native vs translated.