Interesting project, so I'll comment.
Have you benchmarked it against CPython? apython is still a bytecode interpreter, like CPython. And, CPython is compiled to machine code, as
apython is assembled to machine code. So we wouldn't expect a huge speedup.
A skilled assembly language programmer might be able to write faster code
than the C compiler can make from CPython. Is that you? I see the code was written with Claude - does Claude write faster assembly code than the C compiler? Or did you tune the assembly code by hand, or somehow train Claude to write fast code for this application?