undefined | Better HN

0 pointsyvdriess9y ago0 comments

> It's hard to tell from the slides, but it looks like Zen is a much wider architecture than Intel, with 10 execution ports

The text mentions it has to fuse the four FP ports to do a single 256-bit AVX per cycle. This is significantly less wide than Intel architectures (half/quarter). We can interpret the width thus as 4+2+1 ports, which is in the Haswell ballpark.

What is maybe more telling here is the 16-byte load/stores, Haswell is doing 32-byte at the same rate. It points to Zen abandoning FP bandwidth in both client and server. Perhaps they want to rely on GPGPU with the on-chip GPU to do compute workloads?

> The most interesting thing about those slides is the layout of the blocks marked "Scheduler". Intel chips all have a single scheduler, Bulldozer had one float scheduler (shared) and one integer scheduler. But I'm counting 7 schedulers on the Zen slides, one float scheduler managing the 4 floating point execution ports and 6 integer schedulers, one for each execution port.

Depends what they mean with Scheduler. If it means reservation stations for micro-ops, then that's already the case in other micro-architectures. If Scheduler means assigning micro-ops per port, than there can logically only be a single one.

0 comments

phire9y ago

> The text mentions it has to fuse the four FP ports to do a single 256-bit AVX per cycle. This is significantly less wide than Intel architectures (half/quarter). We can interpret the width thus as 4+2+1 ports, which is in the Haswell ballpark.

4+2+2, no need to combine all 4 ports, just the two multiplies or the two adds.

The text is speculation of the journalist. There It's possible that each port is actually 256 bits wide and fusing them is only needed for the 512bit AVX instructions that Intel don't even support yet.

Even if AMD are splitting the 256 bit fpus in half, that is still a huge win over average code, because 128bit SSE instructions are much more common than AVX instructions, and AMD can execute upto four of them per cycle.

Even Intel disable the upper half of their FPU most of the time to save power, AVX instructions get split into two 128bit micro-ops unless until a threshold is encountered and the upper half powers up.

> If Scheduler means assigning micro-ops per port, than there can logically only be a single one.

I assume that means one Re-order buffer per port. Bulldozer already had two Re-order buffer, one for float instructions and one for interger instructions, which proves multiple ROBs for different ports are possible. You just need to track dependencies across ROBs.

I'm guessing that tracking deprbdiencies across 7 schedulers is not much harder than tracking deprbdiencies across 2.

j / k navigate · click thread line to collapse