You can fit twenty-one 3-bit entries in a 64 bit int, with one bit to spare.
So naively you get:
table[nlz / 21] >> (nlz % 21)
Which involves a division and a modulo. And that's assuming 63 entries in total, I'm not even trying to handle fitting the 64th entry in those three leftover bits somehow, or spread them across the three ints (again, I don't see how to do that without division).Alternatively, each integer could contain one bit of the output, so:
((table[0] >> nlz) << 2) + ((table[1] >> nlz) << 1) + (table[1] >> nlz)
Which is five shifts and two additions, so more work plus lookup.If you see another method I overlooked please tell me.