undefined | Better HN

0 pointsnaasking3y ago0 comments

> My experience working on this problem led me to the same conclusion as Mike Pall, which is that compilers do not do well with this pattern

Note that that message is from twelve years ago. A lot's changed since then, not just in compilers but in CPUs. Branch prediction is a lot better now.

0 comments

haberman3y ago

Mike's primary complaint is bad register allocation. It is very important to keep the most important state consistently in registers. In my experience, compilers still struggle to do good register allocation in big and branchy functions.

Even perfect branch prediction cannot solve the problem of unnecessary spills.

naaskingOP3y ago

Very true. I imagine that grouping instructions that use the same registers into their own functions would help with that (arithmetic expressions tend to generate sequences like this). Then you loop within this function while the next instruction is in the same group, and only return to the outer global instruction loop otherwise. If you design the bytecode carefully, you can probably do group checks with a simple bitmask.

10000truths3y ago

Does providing a hint to the compiler using the register keyword address the issue sufficiently?

haberman3y ago

No, most compilers ignore the register keyword, see: https://stackoverflow.com/a/10675111

JonChesterfield3y ago

Nearly. You need register and to also pass them into (potentially no-op) inline asm. `register int v("eax")` iirc, but it's been years since I did this.

The 'register' is indeed largely ignored, but it has the additional somewhat documented meaning of 'when this variable goes into inline asm, it needs to be in that register'. In between asm blocks it can be elsewhere - stack or whatever - but it still gives the regalloc a really clear guide to work from.

1 more reply

fwsgonzo3y ago

I read a research paper that proved that the branch prediction issues are non-issue with modern predictors (eg. TTAGE). It is of course true that register spills happen, but it's not bad enough to want to write hand-written assembly. Especially when you simulate AOT-compiled code (eg. RISC-V and WASM), you will already be 3-10x faster than Lua already. For my purposes of using this kind of emulator for scripting, it is already fine.

Throw instruction counting into the mix, and you can even be faster than LuaJIT, although I'm not sure how it manages to screw up the counting so badly. I wrote a little bit about it here: https://medium.com/@fwsgonzo/time-to-first-instruction-53a04...

j / k navigate · click thread line to collapse

0 comments

haberman3y ago

Even perfect branch prediction cannot solve the problem of unnecessary spills.

naaskingOP3y ago

10000truths3y ago

Does providing a hint to the compiler using the register keyword address the issue sufficiently?

haberman3y ago

No, most compilers ignore the register keyword, see: https://stackoverflow.com/a/10675111

JonChesterfield3y ago

Nearly. You need register and to also pass them into (potentially no-op) inline asm. `register int v("eax")` iirc, but it's been years since I did this.

1 more reply

fwsgonzo3y ago

j / k navigate · click thread line to collapse