undefined | Better HN

0 pointsadrian_b2mo ago0 comments

That feature does not exist in any AMD Zen, but only in certain Zen generations and randomly, i.e. not in successive generations. This optimization has been introduced then removed a couple of times. Therefore this is not an optimization on whose presence you can count in a processor.

I believe that it is not useful to group such an optimization with register renaming. The effect of register renaming is to replace a single register shared by multiple instructions with multiple registers, so that each instructions may use its own private register, without interfering with the other instructions.

On the other hand, the optimization mentioned by you is better viewed as an enhancement of the optimization mentioned by me, and which is implemented in all modern CPUs, i.e. that after a store instruction the stored value persists for some time in the store queue and the subsequent instructions can access it there instead of going to memory.

With this additional optimization, the stored values that are needed by subsequent instructions are retained in some temporary registers even after the store queue is drained to the memory as long as they are still needed.

Unlike with register renaming, here the purpose is not to multiply the memory locations that store a value so that they can be accessed independently. Here the purpose is to cache the value close to the execution units, to be available quickly, instead of taking it from the far away memory.

As mentioned at your link, the most frequent case when this optimization is efficient is when arguments are pushed in the stack before invoking a function and then the invoked function loads the arguments in registers. On the CPUs where this optimization is implemented the passing of arguments to the function bypasses the stack, becoming much faster.

However this calling convention is important mainly for legacy 32-bit applications, because the 64-bit programs pass most arguments inside registers, so they do not need this optimization. Therefore this optimization is more important for Windows, where it is more frequent to use ancient 32-bit executables, which have not been recompiled to 64-bit.

0 comments

gpderetta2mo ago

Yes, it is not in all Zen cpus.

I don't think it makes sense to distinguish it from renaming. It is effectively aliasing a memory location (or better, an offset off the stack pointer) with a physical register, effectively treating named stack offsets as additional architectural registers. AFAIK this is done on the renaming stage.

adrian_bOP2mo ago

The named stack offsets are treated as additional hidden registers, not as additional architectural registers.

You do not access them using architectural register numbers, as you would do with the renamed physical registers, but you access them with an indexed memory addressing mode.

The aliasing between a stack location and a hidden register is of the same nature as the aliasing between a stack location from its true address in the main memory and the location in the L1 cache memory where the the stack locations are normally cached in any other modern CPU.

This optimization present in some Zen CPUs just caches some locations from the stack even closer to the execution units of the CPU core than the L1 cache used for the same purpose in other CPUs, allowing those stack locations to be accessed as fast as the registers.

gpderetta2mo ago

The stack offset (or in general memory location address[1]) has a name (its unique address), exactly like an architectural register, how can it be an hidden register?

In any case, as far as I know the feature is known as Memory Renaming, and it was discussed in Accademia decades before it showed in actual consumer CPUs. It uses the renaming hardware and it behaves more like renaming (0 latency movs resolved at rename time, in the front end) than an actual cache (that involves an AGI unit and a load unit and it is resolved in the execution stages, in the OoO backend) .

[1] more precisely, the feature seems to use address expressions to name the stack slots, instead of actual addresses, although it can handle offset changes after push/pop/call/ret, probably thanks to the Stack Engine that canonicalizes the offsets at the decode stage.

j / k navigate · click thread line to collapse

0 pointsadrian_b2mo ago0 comments

0 comments

gpderetta2mo ago

Yes, it is not in all Zen cpus.

adrian_bOP2mo ago

The named stack offsets are treated as additional hidden registers, not as additional architectural registers.

You do not access them using architectural register numbers, as you would do with the renamed physical registers, but you access them with an indexed memory addressing mode.

gpderetta2mo ago

The stack offset (or in general memory location address[1]) has a name (its unique address), exactly like an architectural register, how can it be an hidden register?

j / k navigate · click thread line to collapse