May be one way would be to use a smaller, separate cache for speculative execution and then copy that value to the regular cache once speculation is confirmed? This would add a one cycle latency for cache-to-cache transfer but there might be better ways.
This problem is already solved with speculative writes to main memory - a speculative store buffer keeps a sequence of memory operations which need to be done when the operation retires. These buffers are very power hungry, because every future speculative read must check every entry in the speculative store buffer to see if it is re-reading a previously written address. That many to many mapping leads to an exponential amount of checking logic.
The same could be done for cache reads/writes, but I have a feeling it would quickly get very complex, large, and power hungry.
More broadly, potential counter- measures limited to the memory cache are likely to be insufficient, since there are other ways that speculative execution can leak information. For example, timing ef- fects from memory bus contention, DRAM row address selection status, availability of virtual registers, ALU ac- tivity, and the state of the branch predictor itself need to be considered.
... also ...
Of course, speculative execution will also affect conventional side channels, such as power and EM
Historically I think it's been assumed that you can't extract much useful information from a modern speculating CPU via EM radiation, but these attacks constantly seem to be surprising people. Re-programming a wifi chip to monitor interference generated by the CPU to spy on speculation? It would have sounded like a pie in the sky fantasy ... yesterday.
DRAM and the memory bus is also affected by DMA operations running independently of the CPU.
Power consumption? There is no hardware available to measure that, let alone at the time resolution required. If you have to first attach a GHz bandwidth oscilloscope to the computer you might as well just reboot it or dump its RAM contents or whatever.
Forget about reprogramming a Wi-Fi chip. They operate on narrow channels in the 2.4GHz range and have fixed hardware for modulation. You would at least have to force the CPU to switch to the right frequency and then be lucky enough that it radiates a signal that demodulated to something sensible within the Wi-Fi hardware. This is physically impossible on current hardware.
Also, on a different note, we cannot sacrifice performance willy-nilly for the sake of a bit of potential security gains. A 30% performance loss on servers means that the counter move is to consume 30% more power to maintain current levels of operation in a data center. This energy needs to be generated, which means that someone is burning oil or gas for it with all the consequences. In essence, the current patches will result in an extra thousands or millions of tons of CO2 in the atmosphere. More efficient replacement hardware will eventually produced with extra environmental impact. We need to find ways to avoid that. Soon.