I imagined going a slightly different route.
A minimal Forth can be written in assembly and in itself. It suffices to write a console using a serial port, a primitive FAT filesystem to access SPI Flash, and maybe even an interface to USB mass storage.
Forth is not very easy to audit, but likely still easier than raw assembly.
One can write a C compiler right on top of that, sufficient to compile TCC.
Alternatively, a simple Lisp can be written on top of the Forth, it's much simpler than writing it in assembly. Using the Lisp, a much more understandable and auditable C compiler can be written.
Much of the Forth, all of the Lisp, and much of the C compiler (except code generation) would be portable and reusable across multiple architectures, without the need to audit them fully every time.
The fun part here is (potentially) not using QEMU and cross-compilers, and running everything on a sufficiently powerful target hardware, for the extra paranoid.
When you have a build dependency on one of multiple stage 0 compilers, the problem of a cycle basically disappears. You need a C++ compiler to build a C++ compiler these days, but you have your choice of two C++ compilers, so the probability you wake up one day without a working C++ compiler that you need is quite low. And the mostly theoretical trusting-trust problem basically disappears on the second stage 0 compiler availability, so the marginal benefit of a third or fourth or nth stage 0 compiler is basically nil.
And yet, the vast majority of bootstrappable build efforts are basically focusing on "how do I go from, say, a hex editor to a working C compiler," which is one of the least useful efforts imaginable. You can sort of see this with their project list: they highly tout their efforts to get to gcc from "stage0", and when they start talking about Java, it instead becomes "here's how to build a 20-year old version of Java, but, uh, most of this stuff is unmaintained so good luck?" And the JVM languages are in a state of "uhhh... we don't know how to break these cycles, any ideas?"
If you have old school TTL, EPROMs, RAM, and time, you could built a CPU you can test all the parts of, and trust. You could even work your way up to floppy disks, and an analog CRT display.
Once you want to ramp up the speed and complexity, things get dicey. I have ideas that would help, but nothing provably secure.
[1] https://www.teamten.com/lawrence/writings/coding-machines/
Which C++ compiler was used to build GCC 4.8?
like, say you are building code, and all the below functions are compilers, and * denotes an evil compiler. Every link in the chain is a compiler building another compiler, until the last node which builds the code.
A() -> B() -> Evil*() -> D() -> E(code) -> binary
how in the world would the evil compiler in this situation inject something malicious into the final binary?
https://dl.acm.org/doi/pdf/10.1145/358198.358210
Russ Cox obtained the actual code for Thompson’s compiler backdoor and presented it here:
Essentially, the evil compiler can include the evil parts of it in the compiler output. Even worse, the evil compiler could include the self-replicating code within the compiler output.
You can follow this logic down an infinite chain as you'd like.