x86-32 doesn't have nearly enough register names, causing totally unnecessary memory spills, which the hardware was never good enough to hide, and having to pass arguments on the stack. That's why x86-64 is faster than -32, even though 64bit wastes so much cache space.
And some instructions are just randomly slow because of handling the weird encoding, like 16-bit math is slower than either 8-bit or 32-bit math.
Then there's eflags, but that's a minor complaint.