https://github.com/leni536/fast_hilbert_curve
I only implemented the index->XY calculation yet. It compiles to 36 instructions without any branches and takes up 86 bytes.
https://github.com/leni536/fast_hilbert_curve/wiki/How-effic...
I think I can apply the same tricks for the inverse function too.