undefined | Better HN

0 pointscelrod2y ago0 comments

Taking the example:

      cmpb $115, %cl
      sete %dl
      addl %edx, %eax

      cmpb $115, %cl
      jne _run_switches_jmptgt1
      mov $1,   %dl
     _run_switches_jmptgt1:  
      addl %edx, %eax

The argument about why `jne` might be faster is that that in the former case, the CPU always executes a dependency chain of length 3: `cmpb` -> `sete` -> `addl`. Each of these instructions have to be computed one after the other, as `sete` depends on the result of `cmpb`, and `addl` depends on the result of `sete`.

With `jne`, the CPU might predict the branch is not taken, in which case, the dependency chain is `mov` -> `addl` (the `mov` of an immediate might be handled by register renaming?).

Or that it is taken, in which case in which case the dependency chain is just `addl`.

I guess you're arguing that the CPU should handle `sete` the same way? That is, instead of treating `addl` as dependent on the result, predict what `sete` does and start executing `addl` before `sete` finishes, rewinding if that went wrong?

0 comments

torstenvl2y ago

Yeah, or at least I don't understand why that wouldn't be possible.

Microcode can set the EIP register based on its prediction of what the result of cmpb $115, %cl will be.

Why can't it set the EDX register based on its prediction of what the result of cmpb $115, %cl will be?

gpderetta2y ago

In principle is perfectly possible to speculatively execute cmov (and viceversa to change jump-over-one-instruction into conditional execution).

But Intel historically didn't do it as programs tend to use cmov when the condition is unpredictable , so there was little reason to optimize it.

After Spectre, I believe intel has given an architectural guarantee that cmov is never speculated so it can be used as part of speculation attack prevention.

j / k navigate · click thread line to collapse