https://compilers.iecc.com/crenshaw/
and its x86 port: https://github.com/lotabout/Let-s-build-a-compiler
As interesting as Lisp-family languages are, I still think it's better to use something with more traditional syntax and start with parsing, because that both reaches a much wider audience and gives a very early introduction to thinking recursively --- the latter being particularly important for understanding of the process in general. A simple expression evaluator, that you can later turn into a JIT and then a compiler, is always a good first exercise.
Not everything needs to be on a so-you-have-never-programmed-before level. This series explicitly assumes "some knowledge of native-code build processes, Lisp, C, and x86 assembly language". People who know Lisp should have had an introduction to thinking recursively already.
Parsing would be a useless distraction for someone interested in writing a Scheme compiler in Scheme.
In terms of creating a "learn compilers from scratch" resource, Crenshaw's approach is definitely better. The trade-off would be that it'd take longer to get past the "writing a recursive descent/LR parser" phase, and it might never get to higher-level language features at all depending on the input language you go with.
http://www.oilshell.org/site.html
And yes one of the main things I had to integrate myself was syntax highlighting (via pygments).
FWIW, I think proportional fonts for text and fixed width for code is more readable.
A few people asked me about the tools, and I dumped them here (but they are not supported, may not be runnable):
https://github.com/oilshell/blog-code/tree/master/tools-snap...
[0]: https://github.com/getzola/after-dark [1]: https://github.com/egoist/hack
https://github.com/nathell/lithium
It's dormant – I was stuck on implementing environments around step 7 of 24 – but someday I will return to it and make progress.
Generate bytecode, but in a form that could be easily mapped to macros on a Macro Assembler, thus we only needed to write such macros for each target platform.
From performance point of view it was quite bad, but we got complete AOT static binaries out of it anyway.
"My first 15 compilers" https://news.ycombinator.com/item?id=20408011
It's a work in progress, but very well made. I think Bob (who is writing the book) is a great educator.
It just helps fully understand how you go from words in a file to actually doing computations and how purely abstract ideas like a 'class' are implemented.
To be fair, I studied Physics and not CS so I didn't have the opportunity to study Compilers at University.
Lots of CS people haven't either. My university moved compiler theory to the Masters program.
Perhaps not always languages, but such experiences are vital to our industry!
Besides, if you're looking for a high-performance Scheme that fully utilizes all available system resources, there are definitely better options available. :)
As you probably want to combine these valuse together into some structures representing the various language constructs, an additional byte to represent the type of the structure, and thus the type of its elements, would also be needed. Than you could do away with the extra bits representing the type.
I just think this is premature optimization and making things unneccessary complex especially for your readers who might want to learn something from it.