1) Apple Silicon outperforms all laptop CPUs in the same power envelope on 1T on industry-standard tests: it's not predominantly due to "optimizing their software stack". SPECint, SPECfp, Geekbench, Cinebench, etc. all show major improvements.
2) x86 also heavily relies on micro-ops to greatly improve performance. This is not a "penalty" in any sense.
3) x86 is now six-wide, eight-wide, or nine-wide (with asterisks) for decode width on all major Intel & AMD cores. The myth of x86 being stuck on four-wide has been long disproven.
4) Large buffers, L1, L2, L3, caches, etc. are not exclusive to any CPU microarchitecture. Anyone can increase them—the question is, how much does your core benefit from larger cache features?
5) Ryzen AI Max 300 (Strix Halo) gets nowhere near Apple on 1T perf / W and still loses on 1T perf. Strix Halo uses slower CPUs versus the beastly 9950X below:
Fanless iPad M4 P-core SPEC2017 int, fp, geomean: 10.61, 15.58, 12.85 AMD 9950X (Zen5) SPEC2017 int, fp, geomean: 10.14, 15.18, 12.41 Intel 285K (Lion Cove) SPEC2017 int, fp, geomean: 9.81, 12.44, 11.05
Source: https://youtu.be/2jEdpCMD5E8?t=185, https://youtu.be/ymoiWv9BF7Q?t=670
The 9950X & 285K eat 20W+ per core for that 1T perf; the M4 uses ~7W. Apple has a node advantage, but no node on Earth gives you 50% less power.
There is no contest.