Sounds more like SPIR-V and intermediary optimization for custom hardware or their current setup. I'm guessing sama wants to copy Deepseek's strategy, not build a language.
One thing I've thought about for coding with LLMs, is that passing in source code to be tokenized by the clip/whatever English text parser seems like it would be suboptimal compared to training on the AST that gets generated by the compiler after parsing the source.