You used the single byte RST 28 Z80 instruction followed by opcodes in the special interpreted language, with opcode 34h meaning exit back to Z80 code.
The interpreter opcodes were quite powerful, including trigonometric, logarithms, string parsing and memory operations. As the name "Calculator Stack" implies the operands were stored on a stack in memory (persisting between invocations of the interpreter). However a downside was it was very slow - visible pauses when doing certain artithmetic operations.
(More here: http://www.users.waitrose.com/~thunor/mmcoyzx81/chapter17.ht... )
A JMP to this thunk costs 3 cycles, and the JMP in the thunk costs 3 cycles, so that buys you nothing compared to the RTS. And the STx to set up the low byte takes up 3 cycles (zero page) or 4 cycles (elsewhere), which is the same or worse than the PHA. But because the high byte is always set up, you save the 5 cycles spent setting that up.
(If you're running from RAM, you don't even need the thunk.)
(Also: the opcode dispatch's EOR trick is space-efficient, but takes an extra cycle - and one fewer bytes, I won't deny - compared to doing a TAY after fetching the byte, then a TYA:AND $F0 later. That sequence takes 6 cycles, whereas the LSR:EOR (R15L),Y sequence takes 7 or 8.)
LDA OPTBL-2,Y
STA OPADDR
JMP (OPADDR)
The contents of OPADDR+1 is initialized once on entry into the interpreter. Or perhaps statically.Another thing would be self-modifying code (if we can forgo ROM-ming this, which Woz couldn't): the interpreter mutates the operand of an immediate JMP instruction to set up the address. That instruction then simply follows; there is no need to branch to it. Same as your thunk, but placed inline.
Ah, the first machine language program I wrote was on the 6502 and used self-modifying code to march through the graphics buffer. Indexed addressing modes were the next chapter in the Rodney Zaks book.
I doubt I ever used JMP indirect. For this sort of thing running from RAM, I'd typically use self-modifying code and a JMP absolute, which is where the idea of having a little thunk came from.
(Page == 256 byte block). Because the op dispatch table stores only the low-order byte of the opcode address; the high order byte is fixed in the interpreter. I think the last function could spill past the end of the page; its starting address just has to be in the page.