> It isn't just because it is RISC, it's Apple magic?
It’s both. We’ve known for decades that RISC was the “right” design, but x86 was so far ahead of everyone else that switching architectures was completely infeasible (even Intel themselves tried and failed with Itanium). It would have taken years to design a new CPU core that could match existing x86 designs, and breaking backwards compatibility is a non-starter in the Windows world. So we ended up with a 20-year-long status quo where ARM dominated the embedded world (due to its simplicity and efficiency) and x86 dominated the desktop world due to its market position.
However, with Apple, all the stars lined up perfectly for them to be able to pull off this transition in a way that no other company was able to accomplish.
- Apple sells both PCs and smartphones, and the smartphone market gave them a reason to justify spending 10 years and billions of dollars on a high-performance ARM core. The A series slowly evolved from a regular smartphone processor, into a high-end smartphone processor, and then into a desktop-class processor in a smartphone.
- Apple (co-)founded ARM, giving them a huge amount of control over the architecture. IIRC they had a ton of influence on the design of AArch64 and beat ARM’s own chips to market by a year.
- Intel’s troubles lately have given Apple a reason to look for an alternative source of processors.
- Apple’s vertical integration of hardware and software means they can transition the entire stack at once, and they don’t have to coordinate with OEMs.
- Apple does not have to worry about backwards compatibility very much compared to a Windows-based manufacturer. Apple has a history of successfully pulling off several architecture transitions, and all the software infrastructure was still in place to support another one. Mac users also tend to be less reliant on legacy or enterprise software.
> It seems weird that you can emulate other instruction sets with RISC underneath and get the performance they do.
As far as I understand it, the only major distinction between RISC and CISC is in the instruction decoder. CISC processors do not typically have any more advanced “hardware acceleration” or special-purpose instructions; the distinction between CISC and RISC is whether you support advanced addressing modes and prefix bytes that let you cram multiple hardware operations into a single software instruction.
For instance, on x86 you can write an instruction like ‘ADD [rax + 0x1234 + 8*rbx], rcx’. In one instruction you’ve performed a multi-step address calculation with two registers, read from memory, added a third register, and written the result back to memory. Whereas on a RISC, you would have to express the individual steps as 4 or 5 separate instructions.
Crucially, you don’t have to do any more actual hardware operations to execute the 4 or 5 RISC as compared to the one CISC instruction. All modern processors convert the incoming instruction stream into a RISCy microcode anyway, so the only performance difference between the two is how much work the processor has to spend decoding instructions. x86 requires a very complex decoder that is difficult to parallelize, whereas ARM uses a much more modern instruction set (AArch64 was designed in 2012) that is designed to maximize decoder throughput.
So this helps us understand why Apple can emulate x86 code so efficiently: the JIT/AOT translator is essentially just running the expensive x86 decode stage ahead of time and converting it to a RISC instruction stream that is easier for a processor to digest. You’re right, though, that native code can always be more tightly optimized since the compiler knows much more about the program than the JIT does and can produce code bettor tailored to the quirks and features of the target processor.
All the experts I listened or read to, they told that instruction set doesn't matter and it is the insignificant thing. The part that matters is branch and data prediction, and caching. Also, even intel transforms an instruction into RISC like microinstructions internally.
> Apple does not have to worry about backwards compatibility very much compared to a Windows-based manufacturer
Windows is literal shit in backwards compatibility too. Try to run any windows 7 or before program in windows 10 and most of the time it won't work. Also, windows can also run in ARM and unlike mac the ARM windows didn't had emulation for years.
Neither ARM nor Itanium are RISC. RISC/CISC don't actually exist - CISC just means "x86" (variable length instructions, memory operands, 2-operand instructions) and RISC means "MIPS or PowerPC" (load store, fixed length 3-operand instructions, weird hardware exposures like delay slots.)
ARM is a load-store architecture and has a lot of registers so it's closer to MIPS but it has complex addressing modes and more instructions. Itanium is VLIW which is almost the opposite of how the M1 works.
Plus ARMv8 in the M1 is a total redesign so it's not exactly the same as older ARMs.
> Crucially, you don’t have to do any more actual hardware operations to execute the 4 or 5 RISC as compared to the one CISC instruction.
This isn't true because you can do a lot of that stuff in one step; just put an adder in the memory access unit. Some complex instructions really are worth putting in the ISA.
x86 uses this to its advantage; the µops can be very long and are not RISCy. RISC is actually harder to deal with here because it's easy to split up instructions into µops, but it's hard to fuse them together again. That's why ARM having condition codes and more complex memory operands is a win.
x86's variable length instructions also fit in memory better, which is good for performance, but they're worse on security because they're harder to parse.