undefined | Better HN

0 pointsretrac2y ago0 comments

Your optimizing compiler today will actually optimize. LLVM was recently ported to the 6502 (yes, really) [1]. An example:

    void outchar (char c) { 
        c = c | 0x80;
        asm volatile ("jsr $fbfd\n" : : "a" (c): "a");
    }

    void outstr (char* str) {
        while (*str != 0) 
            outchar(*str++);
    }

    void main () {
        outstr("Hello, world!\n");
    }

That is compiled to this:

    lda #$c8       ; ASCII H | 0x80
    jsr $fbfd
    lda #$e5       ; ASCII e | 0x80
    jsr $fbfd
    ...

Unrolled loop, over a function applied to a constant string at compile time. An assembler programmer couldn't do better. It is the fastest way to output that string so long as you rely on the ROM routine at $fbfd. (Apple II, for the curious.) Such an optimizing transform is unremarkable today. But stuff like that was cutting edge in the 90s.

[1] https://llvm-mos.org/wiki/Welcome

0 comments

MatthiasPortzel2y ago

I understand your point, but LLVM-MOS is a bad example. You gain LLVM’s language optimizations, as you point out. But LLVM’s assumed architecture is so different from the 6502 that lowering the code to assembly introduces many superfluous instructions. (As an example, the 6502 has one general purpose register, but LLVM works best with many registers. So LLVM-MOS creates 16 virtual registers in the first page of memory and then generates instructions to move them into the main register as they are used.) It’s of course possible to further optimize this, but the LLVM-MOS project isn’t that mature yet. So assembly programmers can still very much do better.

chongli2y ago

So LLVM-MOS creates 16 virtual registers in the first page of memory and then generates instructions to move them into the main register as they are used.

Isn’t this actually good practice on the 6502? The processor treats the first page of memory (called the zero page) differently. Instructions that address the zero page are shorter because they leave out the most significant byte. Addressing any other page requires that extra byte for the MSB.

Furthermore, instructions which accept a zero page address typically complete one cycle faster than absolute addressed instructions, and typically only one cycle slower than immediate addressed instructions.

So if you can keep as much of your memory accesses within the zero page as possible, your code will run a lot faster. It would seem to me that treating the zero page as a table of virtual registers is a great way to do that because you can bring all your register colouring machinery to bear on the problem.

tredre32y ago

I understand your point but the beginning of the zero page is almost always used as virtual registers by regular hand-rolled 6502 applications. So it's pretty normal for LLVM to do the same, it's not an example of LLVM doing something weird.

pjmlp2y ago

Not really that wonder, other that 6502 sucks for C.

"An overview of the PL.8 compiler", circa 1976

https://dl.acm.org/doi/abs/10.1145/989393.989400

Someone2y ago

Does it end that code with a

  jmp $fbfd

?

mananaysiempre2y ago

Doubtful, given the JSR comes from an inline asm. You’d need to code the call in C (with an appropriate calling convention, which I don’t know if this port defines) for Clang to be able to optimize a tail-position call into a jump—which it is capable of doing, generally speaking.

retracOP2y ago

No. The compiler knows that trick for its own code :) not sure about introspecting into the assembly (I think LLVM doesn't do that). But either way, standard C returns int of 0 from main on success. So: ldx #0 txa rts

Someone2y ago

Nitpick: this isn’t standard C (it uses void main, not int main)

Nitpick 2: why ldx #0 txa rts? I would think lda #0 rts is shorter and faster

Back to my question: if it can’t, the claim “an assembler programmer couldn't do better” isn’t correct.

I think an assembler programmer for the 6502 would consider doing a jmp at the end, even if it makes the function return an incorrect, possibly even unpredictable value. If that value isn’t used, why spend time setting it?

A assembly programmer also would:

- check whether the routine at 0xFBFD accidentally guarantees to set A to zero, or returns with the X or Y register set to zero, and shamelessly exploit that.

- check whether the code at 0xFBFD preserves the value of the accumulator (unlikely, I would guess, but if it does, the two consecutive ‘l’s need only one LDA#)

- consider replacing the code to output the space inside “hello world” by a call to FBF4 (move cursor right). That has the same effect if there already is a space there when the code is called.

- call 0xFBF0 to output a printable character, not 0xFBFD (reading https://6502disassembly.com/a2-rom/APPLE2.ROM.html, I notice that is faster for letters and punctuation)

On a 6502, that’s how you get your code fit into memory and make it faster. To write good code for a 6502, you can’t be concerned about calling conventions or having individually testable functions.

2 more replies

retrocryptid2y ago

Well... I mean... if you want inverse video.

j / k navigate · click thread line to collapse

0 pointsretrac2y ago0 comments

Your optimizing compiler today will actually optimize. LLVM was recently ported to the 6502 (yes, really) [1]. An example:

    void outchar (char c) { 
        c = c | 0x80;
        asm volatile ("jsr $fbfd\n" : : "a" (c): "a");
    }

    void outstr (char* str) {
        while (*str != 0) 
            outchar(*str++);
    }

    void main () {
        outstr("Hello, world!\n");
    }

That is compiled to this:

    lda #$c8       ; ASCII H | 0x80
    jsr $fbfd
    lda #$e5       ; ASCII e | 0x80
    jsr $fbfd
    ...

[1] https://llvm-mos.org/wiki/Welcome

0 comments

MatthiasPortzel2y ago

chongli2y ago

So LLVM-MOS creates 16 virtual registers in the first page of memory and then generates instructions to move them into the main register as they are used.

tredre32y ago

pjmlp2y ago

Not really that wonder, other that 6502 sucks for C.

"An overview of the PL.8 compiler", circa 1976

https://dl.acm.org/doi/abs/10.1145/989393.989400

Someone2y ago

Does it end that code with a

  jmp $fbfd

?

mananaysiempre2y ago

retracOP2y ago

Someone2y ago

Nitpick: this isn’t standard C (it uses void main, not int main)

Nitpick 2: why ldx #0 txa rts? I would think lda #0 rts is shorter and faster

Back to my question: if it can’t, the claim “an assembler programmer couldn't do better” isn’t correct.

A assembly programmer also would:

- check whether the routine at 0xFBFD accidentally guarantees to set A to zero, or returns with the X or Y register set to zero, and shamelessly exploit that.

- check whether the code at 0xFBFD preserves the value of the accumulator (unlikely, I would guess, but if it does, the two consecutive ‘l’s need only one LDA#)

- consider replacing the code to output the space inside “hello world” by a call to FBF4 (move cursor right). That has the same effect if there already is a space there when the code is called.

- call 0xFBF0 to output a printable character, not 0xFBFD (reading https://6502disassembly.com/a2-rom/APPLE2.ROM.html, I notice that is faster for letters and punctuation)

2 more replies

retrocryptid2y ago

Well... I mean... if you want inverse video.

j / k navigate · click thread line to collapse