> The trick is to rotate the tag to bits 2-3-4 of the exponent instead of 1-2-3 and add an offset to the exponent to "shift" the range of captured values.
Maybe I misunderstand, but isn't that a similar idea to what I just described? Adding an offset to "rotate" the ranges of the exponent by a segment, putting the one with zero in the high side? The main difference being that you stick to the upper four bits of the exponent, and that I suggested using one of the upper three bit sequences to mark bits 4-5-6-7 as tag bits? (I mistakenly included the sign bit before and claimed this unboxes 15/16ths of all doubles, it's actually 7/8ths) Which probably has consequences for minimizing instructions, like you mentioned.
> But I think the next step is a quality over quantity improvement: capture less floats but capture the right ones.
I suspect that ensuring NaN and Infinity are in there will be crucial to avoid performance cliffs in certain types of code; I have seen production code where it is known that initiated values are always finite, so either of those two are then used as ways to "tag" a float value as "missing" or something to that degree.
Anyway, looking forward to future results of your research!