That's basically the point of Zen, Bulldozer was an architectural dead-end that wasn't going anywhere.
Besides, it's not like Intel have massively innovated since Sandybridge. Ivy, Haswell, Broadwell and Skylake are little more than successive perfections of the Sandybridge architecture.
It's hard to tell from the slides, but it looks like Zen is a much wider architecture than Intel, with 10 execution ports (4 ALU, 2 AGU, 2 FP ADD, 2 FP MUL). Sandybridge had 6, Haswell and later have 8 execution ports. Bulldozer had 4 integer execution ports plus 2 float ports, which are shared between each pair of cores.
The most interesting thing about those slides is the layout of the blocks marked "Scheduler". Intel chips all have a single scheduler, Bulldozer had one float scheduler (shared) and one integer scheduler. But I'm counting 7 schedulers on the Zen slides, one float scheduler managing the 4 floating point execution ports and 6 integer schedulers, one for each execution port.
The text mentions it has to fuse the four FP ports to do a single 256-bit AVX per cycle. This is significantly less wide than Intel architectures (half/quarter). We can interpret the width thus as 4+2+1 ports, which is in the Haswell ballpark.
What is maybe more telling here is the 16-byte load/stores, Haswell is doing 32-byte at the same rate. It points to Zen abandoning FP bandwidth in both client and server. Perhaps they want to rely on GPGPU with the on-chip GPU to do compute workloads?
> The most interesting thing about those slides is the layout of the blocks marked "Scheduler". Intel chips all have a single scheduler, Bulldozer had one float scheduler (shared) and one integer scheduler. But I'm counting 7 schedulers on the Zen slides, one float scheduler managing the 4 floating point execution ports and 6 integer schedulers, one for each execution port.
Depends what they mean with Scheduler. If it means reservation stations for micro-ops, then that's already the case in other micro-architectures. If Scheduler means assigning micro-ops per port, than there can logically only be a single one.
4+2+2, no need to combine all 4 ports, just the two multiplies or the two adds.
The text is speculation of the journalist. There It's possible that each port is actually 256 bits wide and fusing them is only needed for the 512bit AVX instructions that Intel don't even support yet.
Even if AMD are splitting the 256 bit fpus in half, that is still a huge win over average code, because 128bit SSE instructions are much more common than AVX instructions, and AMD can execute upto four of them per cycle.
Even Intel disable the upper half of their FPU most of the time to save power, AVX instructions get split into two 128bit micro-ops unless until a threshold is encountered and the upper half powers up.
> If Scheduler means assigning micro-ops per port, than there can logically only be a single one.
I assume that means one Re-order buffer per port. Bulldozer already had two Re-order buffer, one for float instructions and one for interger instructions, which proves multiple ROBs for different ports are possible. You just need to track dependencies across ROBs.
I'm guessing that tracking deprbdiencies across 7 schedulers is not much harder than tracking deprbdiencies across 2.
The FP contention between the cores in a Bulldozer module makes all recent AMD chips perform objectively worse in most benchmarks than their peers from Intel.
Intel's architecture isn't a priori a goal to achieve. Intel's performance in real-world workloads is a good goal.
There are some heavily-threaded, integer-heavy workloads that Bulldozer and related parts are still incredibly competitive at, even compared to current-gen Intel parts. For the right workload, a Bulldozer-family processor can be a real screamer and they are priced incredibly aggressively. We should recognize, though, that the architecture is high performance only for these specific workloads.
Perhaps AMD should have pursued more innovative architectures. I am not saying that Intel's is perfect. But it is important to note that for current general purpose computing workloads, Intel's architecture is superior to Bulldozer.