jmp $fbfd
?Nitpick 2: why ldx #0 txa rts? I would think lda #0 rts is shorter and faster
Back to my question: if it can’t, the claim “an assembler programmer couldn't do better” isn’t correct.
I think an assembler programmer for the 6502 would consider doing a jmp at the end, even if it makes the function return an incorrect, possibly even unpredictable value. If that value isn’t used, why spend time setting it?
A assembly programmer also would:
- check whether the routine at 0xFBFD accidentally guarantees to set A to zero, or returns with the X or Y register set to zero, and shamelessly exploit that.
- check whether the code at 0xFBFD preserves the value of the accumulator (unlikely, I would guess, but if it does, the two consecutive ‘l’s need only one LDA#)
- consider replacing the code to output the space inside “hello world” by a call to FBF4 (move cursor right). That has the same effect if there already is a space there when the code is called.
- call 0xFBF0 to output a printable character, not 0xFBFD (reading https://6502disassembly.com/a2-rom/APPLE2.ROM.html, I notice that is faster for letters and punctuation)
On a 6502, that’s how you get your code fit into memory and make it faster. To write good code for a 6502, you can’t be concerned about calling conventions or having individually testable functions.
Regarding this specific unrolled loop, I would expect a 6502 programmer would just write the obvious loop, because they're clearly optimizing for space rather than speed when calling the ROM routine. They'll be content with the string printing taking about as long as it takes, which clearly isn't too long, as they wouldn't have done it that way otherwise. And the loop "overhead" won't be meaningful. (Looks like it'll be something like 7 cycles per character? I'm not familiar with the Apple II. Looks like $fbfd preserves X though.)
I just wanted to use 6502 code (so many seem to be able to read it!) with C side by side. x86 would have worked as well. Where the fastest answer would also be the same construct, assuming the dependency on an external routine.
You know what, I'm gonna nitpick that nitpick: void main() is fully allowed on a freestanding target, which is still standard C.
Given the C standards historically generous interpretation of undefined behaviour and other miscellany, I think it's a reasonable interpretation of the standard to pretend that a target that allows something other than int main(...) is freestanding rather than hosted, and therefore fully conforming.