PS: Holy crap! For the first time in my 40+ year career I have clicked-thru from a semi-relevant article about Rust on a micro p̵r̵o̵c̵e̵s̵s̵o̵r̵ controller to a reference about the...[RCA] COSMAC VIP (in the form of this dude's effort to get CHIP-8 running on LLVM-MOS). Do you have any idea how many lawns I had to mow to buy one of those? It was a big disappointment (over my ELF and SuperELF) too! ROFL
https://github.com/embassy-rs/embassy https://github.com/embassy-rs/nrf-softdevice
Together with https://github.com/nrf-rs/nrf-hal these enable most everything one can do on these controllers form pure Rust (the softdevice is a blob with a C-SDK that's wrapped in rust though)
I never expected it to come together this well! Especially considering that the author of the article mentions there were so many issues with LLVM-AVR, you'd expect them to exist in LLVM-MOS as well. Apparently not! I guess the code quality will only improve from here on out, the loop at the bottom of the article does seem like it is not as optimal as it could be :)
We're just now starting to really optimize the compiler; there's definitely a long road ahead of us, but our preliminary investigations suggest that we'll be able to get the thing to emit really quite good 6502 assembly.
Right now, it emits near-garbage in a large number of common cases, as seen in the article. This is mostly due to technical debt intentionally accrued while getting the thing working, though; we did stuff like use the default LLVM lowering for comparisons, which are ridiculously trash on the 6502. But there's only really a couple major technical hurdles left to overcome; everything else is just painstakingly teaching LLVM what the best 6502 assembly patterns are for various situations.
Is there any part of this optimization work that might be upstreamed to LLVM itself and benefit other architectures? Or is this stuff purely 6502-specific?
The 65816 is a better target (moveable direct page and stack and some wider registers), but also awkward with its register mode switching.
My second (and last) assembly language after 6502 was 370 which replaces the "awkward immovable stack" of the 6502 with no hardware stack at all. Applications are completely responsible for maintaining their own call stack.
If all your functions are void foo(void) and you don’t use local variables (or your language doesn’t support recursion, in which case all locals can be given a fixed address), targeting 6502 is fine (it also helps if you avoid floating point, use 8-bit variables where possible, etc)
Not supporting recursion also means you can statically compute maximum stack depth. That way, you can avoid linking code that would overflow the stack.
Then standard cargo tool may be used to directly build 6502 executable, some examples: https://github.com/mrk-its/a800-rust-test or https://github.com/mrk-its/llvm-mos-ferris-demo
Languages like C (or Rust) allocate variables on the stack because it is cheap with modern CPUs, but 8-bit CPUs don't have addressing modes to access them easily. (by the way, some modern CPUs like ARM also cannot add a register to a variable on the stack).
The solution is not to use the stack for variables and instead use zero-page locations. As there are only 256 zero-page bytes, same locations should be reused for variables in different functions. This cannot be used with recursive functions, but such code is ineffecient anyway so it is better not to use them at all and use loops instead.
Another thing is heap and closures (that allocate variables on the heap). Instead of heap the code for 8-bit CPUs should use static allocation.
The article contains an example of 6502 code compiled from Rust and this code is inefficient. It uses too much locations for variables (rc6-rc39) and it wastes time saving and restoring those locations in prologue/epilogue.
No wonder that programs run slowly. It would be much better to compile CHIP-8 directly to 6502 assembly.
First, I have utterly no idea why there are so many calls to memset; it looks like it's unrolling a loop or something... poorly. It also doesn't seem to be reusing registers when setting up the calls; that's also bad and should be fixed.
Second, if you take a look at the actual structure of the prologue and epilogue, you might notice that it's copying zero page to an absolute memory region called __clear_screen_sstk. This is because LLVM-MOS ran a whole-program analysis on the program and proved that at most one activation of that function could occur at any given time. Thus, it's "stack frame" was automatically allocated statically as a global array, not relative to a moving stack pointer.
The reason that the prologue and epilogue spends so much time copying in and out of the zero page is just that we haven't taught LLVM-MOS how to access the stack directly, but there's no technical obstacle to doing so. Once that's done, the whole body of the function would operate on __clear_screen_sstk directly, and the prologue and epilogue would disappear completely.
Of course, from the first point, you shouldn't need any stack locations to do the body of this routine; there's a big ball of yarn here, but pulling on any of a number of threads would unravel it.
Still cool though!
I can see for some dynamic languages there being a destination between the two, but for compiled binaries, generally Rust on X, it doesn’t seem important if rustc also runs on X (especially when discussing micro-controllers since one would rarely run a full compiler on the chip itself).
And the rest are Forth users happily running interactive, extensible compilers with built in assemblers, block IO, screen editors in a multiuser, multitasking environment.
The real advantage of using Rust is in the actual program logic. E.g. the instructions are decoded into an algebraic datatype (in https://github.com/gergoerdi/chirp8-engine/blob/7623353a8bf0...) and then that is consumed in the virtual CPU (https://github.com/gergoerdi/chirp8-engine/blob/7623353a8bf0...). Rust's case-of-case optimization takes care of avoiding the intermediate data representation at runtime.
> It is worth pointing out that the amazing thing about chirp8-c64 is not how well it works, but that it works at all.