Rust on the MOS 6502: Beyond Fibonacci (opens in new tab)

(gergo.erdi.hu)

165 pointsgergoerdi4y ago39 comments

39 comments

PS: Holy crap! For the first time in my 40+ year career I have clicked-thru from a semi-relevant article about Rust on a micro p̵r̵o̵c̵e̵s̵s̵o̵r̵ controller to a reference about the...[RCA] COSMAC VIP (in the form of this dude's effort to get CHIP-8 running on LLVM-MOS). Do you have any idea how many lawns I had to mow to buy one of those? It was a big disappointment (over my ELF and SuperELF) too! ROFL

[ https://youtu.be/fLVN05Jl6wA ]

zwirbl4y ago

I guess when mentioning Rust on Nordic controllers one should also mention these excellent projects

https://github.com/embassy-rs/embassy https://github.com/embassy-rs/nrf-softdevice

Together with https://github.com/nrf-rs/nrf-hal these enable most everything one can do on these controllers form pure Rust (the softdevice is a blob with a C-SDK that's wrapped in rust though)

royjacobs4y ago

That is so cool. I saw some posts about LLVM-MOS a while ago, but at that point I thought it would be just another in a fairly long list of attempts to try and get LLVM to output 6502 instructions.

I never expected it to come together this well! Especially considering that the author of the article mentions there were so many issues with LLVM-AVR, you'd expect them to exist in LLVM-MOS as well. Apparently not! I guess the code quality will only improve from here on out, the loop at the bottom of the article does seem like it is not as optimal as it could be :)

mysterymath4y ago

Up until just a few weeks ago, 100% of the codegen work we've put into LLVM-MOS has been to get it feature-complete and rock-solid. It's awesome to see that that work has paid off!

We're just now starting to really optimize the compiler; there's definitely a long road ahead of us, but our preliminary investigations suggest that we'll be able to get the thing to emit really quite good 6502 assembly.

Right now, it emits near-garbage in a large number of common cases, as seen in the article. This is mostly due to technical debt intentionally accrued while getting the thing working, though; we did stuff like use the default LLVM lowering for comparisons, which are ridiculously trash on the 6502. But there's only really a couple major technical hurdles left to overcome; everything else is just painstakingly teaching LLVM what the best 6502 assembly patterns are for various situations.

zozbot2344y ago

> We're just now starting to really optimize the compiler; there's definitely a long road ahead of us, but our preliminary investigations suggest that we'll be able to get the thing to emit really quite good 6502 assembly.

Is there any part of this optimization work that might be upstreamed to LLVM itself and benefit other architectures? Or is this stuff purely 6502-specific?

1 more reply

cmrdporcupine4y ago

I haven't looked at this closely, but 6502 really doesn't lend itself to C compilation. Three registers, only one of which works with the ALU, awkward immovable stack, etc.

The 65816 is a better target (moveable direct page and stack and some wider registers), but also awkward with its register mode switching.

gergoerdiOP4y ago

From what I understand, LLVM-MOS treats large parts of the zero page as virtual ("imaginary") registers, so you have no shortage of that (https://llvm-mos.org/wiki/Imaginary_registers). Then, sufficiently advanced compiler technology improves the stack situation (https://llvm-mos.org/wiki/C_calling_convention).

1 more reply

dhosek4y ago

I remember being in high school, reading K&R and trying to figure out how I could get a C compiler running on an Apple ][. Never did, but it was a useful intellectual enterprise.

My second (and last) assembly language after 6502 was 370 which replaces the "awkward immovable stack" of the 6502 with no hardware stack at all. Applications are completely responsible for maintaining their own call stack.

Someone4y ago

Not only C, any language that thinks there’s other things than global state.

If all your functions are void foo(void) and you don’t use local variables (or your language doesn’t support recursion, in which case all locals can be given a fixed address), targeting 6502 is fine (it also helps if you avoid floating point, use 8-bit variables where possible, etc)

Not supporting recursion also means you can statically compute maximum stack depth. That way, you can avoid linking code that would overflow the stack.

2 more replies

leeter4y ago

I wonder if the CSG-65CE02 wasn't an attempt to make C easier for the C6x/c128 line. Unfortunately it never saw the light of day except as a serial controller and isn't available today

https://en.wikipedia.org/wiki/CSG_65CE02

royjacobs4y ago

They actually address some of that on their project page, see: https://llvm-mos.org/wiki/Findings

1 more reply

emrk4y ago

Author of mentioned post on 6502.org forum here. In the meantime I worked a bit on implementing proper rust target-triple for 6502 (mos-unknown-none), code is here: https://github.com/mrk-its/rust/tree/mos_target

Then standard cargo tool may be used to directly build 6502 executable, some examples: https://github.com/mrk-its/a800-rust-test or https://github.com/mrk-its/llvm-mos-ferris-demo

gergoerdiOP4y ago

That's cool! I wanted to avoid having to build Rust and/or LLVM from source myself, hence the somewhat awkward "tell Cargo we're on default target, let Clang sort it out at link time" setup.

codedokode4y ago

I am not sure if it is a good idea to compile code targeted to modern processors to 8-bit CPUs like 6502. For example:

Languages like C (or Rust) allocate variables on the stack because it is cheap with modern CPUs, but 8-bit CPUs don't have addressing modes to access them easily. (by the way, some modern CPUs like ARM also cannot add a register to a variable on the stack).

The solution is not to use the stack for variables and instead use zero-page locations. As there are only 256 zero-page bytes, same locations should be reused for variables in different functions. This cannot be used with recursive functions, but such code is ineffecient anyway so it is better not to use them at all and use loops instead.

Another thing is heap and closures (that allocate variables on the heap). Instead of heap the code for 8-bit CPUs should use static allocation.

The article contains an example of 6502 code compiled from Rust and this code is inefficient. It uses too much locations for variables (rc6-rc39) and it wastes time saving and restoring those locations in prologue/epilogue.

No wonder that programs run slowly. It would be much better to compile CHIP-8 directly to 6502 assembly.

mysterymath4y ago

Most of the inoptimality in the article isn't due to the issues you've raised, but rather due to us just starting to optimize LLVM-MOS.

First, I have utterly no idea why there are so many calls to memset; it looks like it's unrolling a loop or something... poorly. It also doesn't seem to be reusing registers when setting up the calls; that's also bad and should be fixed.

Second, if you take a look at the actual structure of the prologue and epilogue, you might notice that it's copying zero page to an absolute memory region called __clear_screen_sstk. This is because LLVM-MOS ran a whole-program analysis on the program and proved that at most one activation of that function could occur at any given time. Thus, it's "stack frame" was automatically allocated statically as a global array, not relative to a moving stack pointer.

The reason that the prologue and epilogue spends so much time copying in and out of the zero page is just that we haven't taught LLVM-MOS how to access the stack directly, but there's no technical obstacle to doing so. Once that's done, the whole body of the function would operate on __clear_screen_sstk directly, and the prologue and epilogue would disappear completely.

Of course, from the first point, you shouldn't need any stack locations to do the body of this routine; there's a big ball of yarn here, but pulling on any of a number of threads would unravel it.

antirez4y ago

Strange exercise because Rust and the 6502 original programming mood are totally different: a word of cleverness and the most obscure side effects in order to squeeze the last clock cycle. But everything is "hack value", I will respect.

person224y ago

I don't think you can get past that the 6502 was meant to be programmed in assembly. Some of the tricks needed to optimally use memory just don't lend themselves to higher level languages. I started with a lot of basic and then moved to assembler because it was the easiest path.

rob744y ago

Er... the article doesn't make it clear, but I guess we're talking about cross-compilation here? So it's not "Rust" (or, as he writes later, LLVM) running on the 6502, just the code generated by the Rust compiler.

Still cool though!

bluejekyll4y ago

Don’t most people generally mean the target binary from the compiler and not the compiler itself when someone says “see * running on this architecture”?

I can see for some dynamic languages there being a destination between the two, but for compiled binaries, generally Rust on X, it doesn’t seem important if rustc also runs on X (especially when discussing micro-controllers since one would rarely run a full compiler on the chip itself).

fmakunbound4y ago

> Don’t most people

And the rest are Forth users happily running interactive, extensible compilers with built in assemblers, block IO, screen editors in a multiuser, multitasking environment.

1 more reply

rob744y ago

Well, when someone says "see Doom running on this architecture", they usually do mean that Doom is running on the architecture. So "Rust for the MOS 6502" or something like that would have been better. But yeah, maybe I'm too nitpicky and unfair to a non-native speaker...

ww5204y ago

So WASM on 6502 next?

fallat4y ago

It looks like so much Rust code to generate the simplest of 6502 code. No thanks.

gergoerdiOP4y ago

Did you look at chirp8-engine, or only chirp8-c64? The value add is not in the parts that interface with the C64 internals; probably using C for that would make for nicer code. But I wanted to push as much into Rust as I could in the short amount of time I spent on this.

The real advantage of using Rust is in the actual program logic. E.g. the instructions are decoded into an algebraic datatype (in https://github.com/gergoerdi/chirp8-engine/blob/7623353a8bf0...) and then that is consumed in the virtual CPU (https://github.com/gergoerdi/chirp8-engine/blob/7623353a8bf0...). Rust's case-of-case optimization takes care of avoiding the intermediate data representation at runtime.

boomlinde4y ago

No thanks indeed, but I completely agree with this sentiment from the article:

> It is worth pointing out that the amazing thing about chirp8-c64 is not how well it works, but that it works at all.

j / k navigate · click thread line to collapse

39 comments

vaxman4y ago

[ https://youtu.be/fLVN05Jl6wA ]

zwirbl4y ago

I guess when mentioning Rust on Nordic controllers one should also mention these excellent projects

https://github.com/embassy-rs/embassy https://github.com/embassy-rs/nrf-softdevice

Together with https://github.com/nrf-rs/nrf-hal these enable most everything one can do on these controllers form pure Rust (the softdevice is a blob with a C-SDK that's wrapped in rust though)

royjacobs4y ago

That is so cool. I saw some posts about LLVM-MOS a while ago, but at that point I thought it would be just another in a fairly long list of attempts to try and get LLVM to output 6502 instructions.

mysterymath4y ago

Up until just a few weeks ago, 100% of the codegen work we've put into LLVM-MOS has been to get it feature-complete and rock-solid. It's awesome to see that that work has paid off!

zozbot2344y ago

Is there any part of this optimization work that might be upstreamed to LLVM itself and benefit other architectures? Or is this stuff purely 6502-specific?

1 more reply

cmrdporcupine4y ago

I haven't looked at this closely, but 6502 really doesn't lend itself to C compilation. Three registers, only one of which works with the ALU, awkward immovable stack, etc.

The 65816 is a better target (moveable direct page and stack and some wider registers), but also awkward with its register mode switching.

gergoerdiOP4y ago

1 more reply

dhosek4y ago

I remember being in high school, reading K&R and trying to figure out how I could get a C compiler running on an Apple ][. Never did, but it was a useful intellectual enterprise.

Someone4y ago

Not only C, any language that thinks there’s other things than global state.

Not supporting recursion also means you can statically compute maximum stack depth. That way, you can avoid linking code that would overflow the stack.

2 more replies

leeter4y ago

I wonder if the CSG-65CE02 wasn't an attempt to make C easier for the C6x/c128 line. Unfortunately it never saw the light of day except as a serial controller and isn't available today

https://en.wikipedia.org/wiki/CSG_65CE02

royjacobs4y ago

They actually address some of that on their project page, see: https://llvm-mos.org/wiki/Findings

1 more reply

emrk4y ago

Then standard cargo tool may be used to directly build 6502 executable, some examples: https://github.com/mrk-its/a800-rust-test or https://github.com/mrk-its/llvm-mos-ferris-demo

gergoerdiOP4y ago

That's cool! I wanted to avoid having to build Rust and/or LLVM from source myself, hence the somewhat awkward "tell Cargo we're on default target, let Clang sort it out at link time" setup.

codedokode4y ago

I am not sure if it is a good idea to compile code targeted to modern processors to 8-bit CPUs like 6502. For example:

Another thing is heap and closures (that allocate variables on the heap). Instead of heap the code for 8-bit CPUs should use static allocation.

No wonder that programs run slowly. It would be much better to compile CHIP-8 directly to 6502 assembly.

mysterymath4y ago

Most of the inoptimality in the article isn't due to the issues you've raised, but rather due to us just starting to optimize LLVM-MOS.

Of course, from the first point, you shouldn't need any stack locations to do the body of this routine; there's a big ball of yarn here, but pulling on any of a number of threads would unravel it.

antirez4y ago

person224y ago

rob744y ago

Still cool though!

bluejekyll4y ago

Don’t most people generally mean the target binary from the compiler and not the compiler itself when someone says “see * running on this architecture”?

fmakunbound4y ago

> Don’t most people

And the rest are Forth users happily running interactive, extensible compilers with built in assemblers, block IO, screen editors in a multiuser, multitasking environment.

1 more reply

rob744y ago

ww5204y ago

So WASM on 6502 next?

fallat4y ago

It looks like so much Rust code to generate the simplest of 6502 code. No thanks.

gergoerdiOP4y ago

boomlinde4y ago

No thanks indeed, but I completely agree with this sentiment from the article:

> It is worth pointing out that the amazing thing about chirp8-c64 is not how well it works, but that it works at all.

j / k navigate · click thread line to collapse