> Now, instead asking the model to respond in JSON we use XML tags to separate the start and end of the contemplation phase and the final answer
I suspect the author wisely avoided function-calling/JSON since it doesn't guarantee sequence. This–and a few other frailties–make me almost always use XML-like markup for my LLM API calls.
Markup langs like XML and HTML lend themselves quite beautifully to this task. They are stream-friendly, semantically enriched, leniently parseable (html was designed in part for fallible humans to write and for browsers to incrementally render) and by nature of being "markup" they are complementary to the autoregressive nature of LLMs. One assumes as well that tonnes of prose appears in HTML found in training corpuses, less so in JSON which is usually used for transactional data and RPC-like things, which must surely bias JSON completions to more robotic formations. FWIW I ended up creating a library (github.com/padolsey/xmllm) to help me get structured data from LLMs using XML (through the forgiving eyes of an HTML parser), so that I never have to rely on specific LLM tool/function-calling abstractions. Even tiny models like Qwen2.5 and Ministral3B have pretty superb (x|ht)ml compliance, much less so with JSON.
I always found the “structured output feature” thing odd. LLMs will follow any structure present in the prompt. So with a few few-shot examples, you have always been able to make them do anything - return JSON, Python function calls, or any syntax you come up with that works well for your task (ie streaming, separating thinking tokens from output, etc.).
All of that comes just with a little bit of overhead for the developer: Giving the examples, writing the parsing logic, and a strategy for the rare cases where the parsing may fail (retry, default value, prompt adaptation).
Once you embrace this approach, switching models is trivial. You can evaluate models with heavy RLHF like OpenAI’s against open models, base models, tiny models. Very easy to save a lot of money and achieve incredible speeds.