https://www.usenix.org/legacy/events%2Fvee05%2Ffull_papers/p... [2005]
(Hey, I seem to remember tha an Anton Ertl posts to comp.compilers.)
Spoiler: they claim that with their sophisticated translation from stack to register code, they eliminated 47% of the instructions, and the resulting code is still around 25% larger than the byte code. The size advantage goes to stack-based byte code, but it may not necessarily be as large as you might think.
So more or less in line with your findings or intuition?