However, apart from this I don't see anything concrete that ChatML uses different parts of the network for different input sources. The source is prefixed, but it doesn't seem to say anything about how the source parameter is processed.
Also, with all due respect, but your finding that ChatML does not work seems to be mainly this:
>> Note that ChatML makes explicit to the model the source of each piece of text, and particularly shows the boundary between human and AI text. This gives an _opportunity_ to mitigate and _eventually_ solve injections, as the model can tell which instructions come from the developer, the user, or its own input.
> Emphasis mine. To summarize, they are saying injections aren’t solved with this and that they don’t know if this approach can ever make it safe. I also assume Bing already uses this format, although I cannot confirm. I don’t know how robust models trained from the ground up with this segmentation in mind will perform, but I am doubtful they will fully mitigate the issue.
Which I find somewhat weak, as it's basically just tea-leaf reading from an OpenAI blog post.
I fully agree with your main take that this is an unsolved problem so far though. Seems a general problem with instruction-tuned LLMs is that they now treat everything as an instruction.