Yeah, I avoided any other changes to avoid confusing the issue. But any reordering I could have done, the compiler could have done too. Your point about branches is fair though, as the 'active' renamed registers after a branch can only be known at runtime.
Still, I wonder whether some of the features of modern CPUs could be dropped if it wasn't for legacy code. On the other hand, Itanium tried to push the parallelism work onto the compiler and look where that ended up!