> It isn't just because it is RISC, it's Apple magic?
It’s both. We’ve known for decades that RISC was the “right” design, but x86 was so far ahead of everyone else that switching architectures was completely infeasible (even Intel themselves tried and failed with Itanium). It would have taken years to design a new CPU core that could match existing x86 designs, and breaking backwards compatibility is a non-starter in the Windows world. So we ended up with a 20-year-long status quo where ARM dominated the embedded world (due to its simplicity and efficiency) and x86 dominated the desktop world due to its market position.
However, with Apple, all the stars lined up perfectly for them to be able to pull off this transition in a way that no other company was able to accomplish.
- Apple sells both PCs and smartphones, and the smartphone market gave them a reason to justify spending 10 years and billions of dollars on a high-performance ARM core. The A series slowly evolved from a regular smartphone processor, into a high-end smartphone processor, and then into a desktop-class processor in a smartphone.
- Apple (co-)founded ARM, giving them a huge amount of control over the architecture. IIRC they had a ton of influence on the design of AArch64 and beat ARM’s own chips to market by a year.
- Intel’s troubles lately have given Apple a reason to look for an alternative source of processors.
- Apple’s vertical integration of hardware and software means they can transition the entire stack at once, and they don’t have to coordinate with OEMs.
- Apple does not have to worry about backwards compatibility very much compared to a Windows-based manufacturer. Apple has a history of successfully pulling off several architecture transitions, and all the software infrastructure was still in place to support another one. Mac users also tend to be less reliant on legacy or enterprise software.
> It seems weird that you can emulate other instruction sets with RISC underneath and get the performance they do.
As far as I understand it, the only major distinction between RISC and CISC is in the instruction decoder. CISC processors do not typically have any more advanced “hardware acceleration” or special-purpose instructions; the distinction between CISC and RISC is whether you support advanced addressing modes and prefix bytes that let you cram multiple hardware operations into a single software instruction.
For instance, on x86 you can write an instruction like ‘ADD [rax + 0x1234 + 8*rbx], rcx’. In one instruction you’ve performed a multi-step address calculation with two registers, read from memory, added a third register, and written the result back to memory. Whereas on a RISC, you would have to express the individual steps as 4 or 5 separate instructions.
Crucially, you don’t have to do any more actual hardware operations to execute the 4 or 5 RISC as compared to the one CISC instruction. All modern processors convert the incoming instruction stream into a RISCy microcode anyway, so the only performance difference between the two is how much work the processor has to spend decoding instructions. x86 requires a very complex decoder that is difficult to parallelize, whereas ARM uses a much more modern instruction set (AArch64 was designed in 2012) that is designed to maximize decoder throughput.
So this helps us understand why Apple can emulate x86 code so efficiently: the JIT/AOT translator is essentially just running the expensive x86 decode stage ahead of time and converting it to a RISC instruction stream that is easier for a processor to digest. You’re right, though, that native code can always be more tightly optimized since the compiler knows much more about the program than the JIT does and can produce code bettor tailored to the quirks and features of the target processor.