It's going to be really hard to give up real world gains from branch prediction. Branch prediction can make a lot of real world (read "not the finest code in the world") run at reasonable speeds. Another common pattern to give up would be eliding (branch predicting away) nil reference checks.
> short of entirely preventing speculating code from being able to load things into the cache
Some new server processors allow us to partition cache (to prevent noisy neighbors) [1,2]. I don't have experience working with this technology but everything I read makes me believe this mechanism can works on a per process basis.
If that kind of complexity is already possible in CPU cache hierarchy I wonder if it's possible to implement per process cache encryption. New processors (EPYC) can already use different encryption keys for each VM, so it might be a matter of time till this is extended further.