Say you have a simple function that is going to add 1 to a bunch of variables. In an ARM-like assembly code, this could be written as:
LDR r1, [r0, #0]
ADD r1, r1, #1
STR r1, [r0, #0]
LDR r1, [r0, #4]
ADD r1, r1, #1
STR r1, [r0, #4]
LDR r1, [r0, #8]
ADD r1, r1, #1
STR r1, [r0, #8]
Now, if your CPU can do OoOE, it can spot that register r1 is used for three independent loads, adds and stores, and can internally use three different registers for them, allowing the operations to be done in parallel. But, equally, the compiler could have written the code as: LDR r1, [r0, #0]
ADD r1, r1, #1
STR r1, [r0, #0]
LDR r2, [r0, #4]
ADD r2, r2, #1
STR r2, [r0, #4]
LDR r3, [r0, #8]
ADD r3, r3, #1
STR r3, [r0, #8]
Compilers and register renaming are fighting each other. In traditional compiler writing, you try to minimise the register usage and output the first code listing. But if you have plenty of registers, you could output the second code instead, and let the CPU do parallel execution without the need for register renaming.In other words, once you have enough 'real' registers does it get rid of the need for register renaming? Intel added it to their pentiums to improve existing x86 code, but I wonder if it has that much of a benefit with newer ISAs that have 'enough' registers and properly tuned compilers?