May be one way would be to use a smaller, separate cache for speculative execution and then copy that value to the regular cache once speculation is confirmed? This would add a one cycle latency for cache-to-cache transfer but there might be better ways.