undefined | Better HN

0 pointspcwalton6y ago0 comments

You can kiss any semblance of reasonable performance goodbye if you eliminate "speculative execution". Pipelining is the most basic tool in the toolbox. Even microcontrollers do it.

0 comments

umanwizard6y ago

To give ballpark numbers: modern Intel processors can retire a few instructions per cycle in tight loops (4 is the theoretical maximum; > 2 is realistic in a lot of high-performance code). A branch misprediction wastes 10-15 cycles.

So getting rid of speculation entirely, and stalling on every branch, would waste time equivalent to dozens of instructions. On typical code that has a branch every few instructions, this could slow down execution by several times.

ccozan6y ago

Can we actually compute without branching? Genuine question.

What architecture would do that, and how?

gmueckl6y ago

The simplified SIMD cores in early GPUs had to fake branching to some extent for their virtual threads: every branch in the shader code would be tested for each virtual flag and that thread (really just a vector component) would be masked out for the instructions of the branch that didn't apply. The GPU would run both branches, relying on the mask. It was workable, but very slow.

pcwaltonOP6y ago

Old GPUs did that. It wasn't very pleasant to program with. :)

nwallin6y ago

You can compute with less than that. (all links are to the same thing)

https://github.com/xoreaxeaxeax/movfuscator

https://m.youtube.com/watch?v=R7EEoWg6Ekk

https://news.ycombinator.com/item?id=18991404

1 more reply

deathanatos6y ago

Pipelining isn't strictly the same thing as speculation, though, is it? If I have,

  add %rax, %rbx
  add %rcx, %rdx

I can pipeline those without needing to speculate on anything. If there is a dependency on a previous instruction, then we might have to speculate, but hopefully there is still some case for pipelining?

Have any of these bugs been completely based on speculation, or is it always speculating across privilege boundaries? (Although I feel like even the former isn't same, e.g., if you're in some form of VM attempting to maintain privilege separations.)

cperciva6y ago

It's related. If you want decent performance with pipelining, you're going to want to speculate at least a bit -- assume that FP math doesn't trigger exceptions, assume that you predicted branches correctly, assume that memory accesses don't fault, etc.

Intel does more speculation, but you won't find anything beyond the tiniest embedded CPUs which don't do any.

j / k navigate · click thread line to collapse