Why XML tags are so fundamental to Claude (opens in new tab)

(glthr.com)

228 pointsglth2mo ago153 comments

153 comments

The thesis here seems to be that delimiters provide important context for Claude, and for that putpose we should use XML.

The article even references English's built-in delimiter, the quotation mark, which is reprented as a token for Claude, part of its training data.

So are we sure the lesson isn't simply to leverage delimiters, such as quotation marks, in prompts, period? The article doesn't identify any way in which XML is superior to quotation marks in scenarios requiring the type of disambiguation quotation marks provide.

Rather, the example XML tags shown seem to be serving as a shorthand for notating sections of the prompt ("treat this part of the prompt in this particular way"). That's useful, but seems to be addressing concerns that are separate from those contemplated by the author.

sheept2mo ago

XML is a bit more special/first class to Claude because it uses XML for tool calling:

    <antml:invoke name="Read">                                                    
      <antml:parameter name="file_path">/path/to/file</antml:parameter>             
      <antml:parameter name="offset">100</antml:parameter>                          
      <antml:parameter name="limit">50</antml:parameter>                            
    </antml:invoke>

I'm sure Claude can handle any delimiter and pseudo markup you throw at it, but one benefit of XML delimiters over quotation marks is that you repeat the delimiter name at the end, which I'd imagine might help if its contents are long (it certainly helps humans).

bear3r2mo ago

the antml: namespace prefix is doing extra work here too -- even if user input contains invoke tags, they won't collide with tool calls because the namespace differs. not just xml for structure but namespaced xml for isolation.

xeyownt2mo ago

Cannot believe it's efficient. XML is the most verbose and inefficient of communicating anything. The only benefit of XML was to give lifetime work to an army of engineers. The next news will be "Why DTD is so fundamental to Claude".

imtringued2mo ago

The point isn't to be efficient. If you train an LLM on code with an example execution trace written in the comments, the LLM gains a better understanding due to the additional context in the data. LLMs don't have a real world model. For them, the token space is the real world. All the information needs to be present in the training data and XML makes it easy because it is verbose and explicit about everything.

wolttam2mo ago

When you're tokenizing it does not matter really what you use (how you translate that token to-from a text string), the main thing is the overall number of tokens. XML is particularly amenable to tokenization because it is trivial to represent entire tags as a single token (or a pair of tokens, one for the open tag, one for the close).

It gets a bit muddier with attributes, but you can still capture the core semantics of the tag with a single token. The model will learn that tag's attributes through training on usages of the tag.

RandomBK2mo ago

How well do we understand the tokenization for Claude? I'd posit that the exact human-representation of this markup is likely irrelevant if it's all being converted into a single token.

PeterStuer2mo ago

"<" ">" and "/>" are indeed single tokens.

mpyne2mo ago

I had a lot of success with Claude just by telling it I would throw a relevant snippet in a <tag></tag> pair. That's not even XML, nor has it been really needed in my experience.

Even simple --- separators is usually enough to get good results, it just needs to be reasonably clear which items are distinct from each other.

jinushaun2mo ago

Except quotation marks look like regular text. I regularly use quotes in prompts for, ya know, quotes.

wolttam2mo ago

The GP isn't suggesting to literally use quotes as the delimiter when prompting LLMs. They're pointing out that we humans already use delimiters in our natural language (quotation marks to delimit quotes). They're suggesting that delimiters of any kind may be helpful in the context of LLM prompting, which to me makes intuitive sense. That Claude is using XML is merely a convention.

1 more reply

krackers2mo ago

All system prompts are already wrapped in specific role markers (each LLM has its own unique format), so I'm sure every lab is familiar with the concept of delimters, in-band vs out-of-band signalling and such.

It'd not clear why within any section XML markers would do better than something like markdown, other than claude being explicitly post-trained with XML prompts as opposed to markdown. One hypothesis could be that since a large portion of the training corpus is websites, XML is more natural to use since it's "learned" the structure of XML better than markdown. Another could be that explicit start/end tags make identifying matching delimiters easier than JSON (which requires counting matching brackets) or markdown (where the end of a section is implicitly defined by the presence of a new header element).

thethimble2mo ago

Perhaps named closing tags like `</section>` are a factor?

arbirk2mo ago

If this is true, the why does Claude Code's own system prompt not use this style? https://github.com/Piebald-AI/claude-code-system-prompts/tre...

its-summertime2mo ago

https://github.com/Piebald-AI/claude-code-system-prompts/blo... They seem to use XML-esque tags here in the first prompt I looked at

arbirk2mo ago

Yes, but that is for a specific JSON format. The instructions are in md

Lerc2mo ago

I am unconvinced.

To me it seems like handling symbols that start and end sequences that could contain further start and end symbols is a difficult case.

Humans can't do this very well either, we use visual aids such as indentation, synax hilighting or resort to just plain counting of levels.

Obviously it's easy to throw parameters and training at the problem, you can easily synthetically generate all the XML training data you want.

I can't help but think that training data should have a metadata token per content token. A way to encode the known information about each token that is not represented in the literal text.

Especially tagging tokens explicitly as fiction, code, code from a known working project, something generated by itself, something provided by the user.

While it might be fighting the bitter lesson, I think for explicitly structured data there should be benefits. I'd even go as far to suggest the metadata could handle nesting if it contained dimensions that performed rope operations to keep track of the depth.

If you had such a metadata stream per token there's also the possibility of fine tuning instruction models to only follow instructions with a 'said by user' metadata, and then at inference time filter out that particular metadata signal from all other inputs.

It seems like that would make prompt injection much harder.

scotty792mo ago

Transformers look like perfect tech for keeping track of how deep and inside of what we are at the moment.

thesz2mo ago

Transformers are able to recognize balanced brackets grammar at 97% success rate: https://openreview.net/pdf?id=kaILSVAspn

This is 3% or infinitely far away from the perfect tech.

The perfect tech is the stack.

krackers2mo ago

This is very interesting since there is another notable paper which shows LLMs can recognize and generate CFGs

https://arxiv.org/abs/2305.13673

and of course a^n b^n is also classic CFG, so it's not clear why one paper had positive results while the other hand negative.

1 more reply

cyanydeez2mo ago

Basically, the only way you're separting user input from model meta-input is using some kind of character that'll never show up in the output of either users or LLMs.

While technically possible, it'd be like a unicode conspiracy that had to quietly update everywhere without anyone being the wiser.

Lerc2mo ago

Not at all. You have a set of embeddings for the literal token, and a set for the metadata. At inference time all input gets the literal embedding, the metadata embedding can receive provenance data or nothing at all. You have a vector for user query in the metadata space. The inference engine dissallows any metadata that is not user input to be close to the user query vector.

Imagine a model finteuned to only obey instructions in a Scots accent, but all non user input was converted into text first then read out in a Benoit Blanc speech model. I'm thinking something like that only less amusing.

dragonwriter2mo ago

Actually, all you need is an interface that lets you manipulate the token sequence instead of the text sequence along with a map of the special tokens for the model (most [all?] models have special tokens with defined meanings used in training and inference that are not mapped from character sequences, and native harnesses [the backend APIs of hosted models that only provide a text interface and not a token-level one] leverage them to structure input to the model after tokenization of the various pieces that come to the harnesses API from whatever frontend is in use.)

zahlman2mo ago

Couldn't you just insert tokens that don't correspond to any possible input, after the tokenization is performed? Unicode is bounded, but token IDs not so much.

krackers2mo ago

This already happens, user vs system prompts are delimited in this manner, and most good frontends will treat any user input as "needing to be escaped" so you can never "prompt inject" your way into emitting a system role token.

The issue is that you don't need to physically emit a "system role" token in order to convince the LLM that it's worth ignoring the system instructions.

1 more reply

RadiozRadioz2mo ago

> a contrast between Claude’s modern approach [...] XML, a technology dating back to 1998

Are we really at the point where some people see XML as a spooky old technology? The phrasing dotted around this article makes me feel that way. I find this quite strange.

coldtea2mo ago

XML has been "spooky old technology" for over a decade now. It's heyday was something like 2002.

Nobody dares advertise the XML capabilities of their product (which back then everybody did), nobody considers it either hot new thing (like back then) or mature - just obsolete enterprise shit.

It's about as popular now as J2EE, except to people that think "10 years ago" means 1999.

rhdunn2mo ago

XML is used a lot in standards and publishing industries -- JATS, EPUB, ODF, DOCX/XLSX/..., DocBook, etc. are all XML based/use XML.

miki1232112mo ago

And I think this makes sense.

XML is really great for text documents with embeds and markup, either semantic (this part of the paper is an abstract) or visual (this part of the document should be 14-point and aligned right). You can do this in JSON, but it's a pain.

JSON is great for representing data. If you have some data structures and two machines trying to exchange them, JSON is great for that.

TOML / yaml / hcl / JSON with comments are great at config. If you have a human writing something that a machine is supposed to understand, you don't want turning completeness and you don't want to deal with the pain of having your own DSL, those are great.

michaelbarton2mo ago

Without being facetious, isn’t HTML a dialect of XML and very widely used?

5 more replies

coldtea2mo ago

Yes, there's a handful of niches. Still 1/1000th the momentum it had, or adoption it was expected to get, and nobody under 40 even considers it for new stuff.

1 more reply

graemep2mo ago

Also in finance. XBRL and FIXML although I do not know how widely used the latter is.

vlovich1232mo ago

For me, even when it was first released, I considered obsolete enterprise shit. That view has not diminished as the sorry state of performance and security in that space has just reaffirmed that perception.

girvo2mo ago

I kind of miss SOAP. Ahead of its time? Probably not, but I built some cool things on top of it

vbezhenar2mo ago

Right now I'm writing adapter so people could call one SOAP service using simpler interfaces. That involves implementing WS-Security with non-standard algorithms, that also involves dealing with things like XML escaped into a string and embedded inside another XML.

Let's say I hope for the day I'll miss SOAP. Right now I have too much of it.

pfraze2mo ago

atproto's lexicon-based rpc is pretty soap-like

cyanydeez2mo ago

20 years old means 1980!

fc417fc8022mo ago

It's not the hot new thing but when has hype ever mattered for getting shit done? I don't think anyone who considers it obsolete has an informed opinion on the matter.

Typically a more primitive (sorry, minimal) format such as JSON is sufficient in which case there's no excuse to overcomplicate things. But sometimes JSON isn't sufficient and people start inventing half baked solutions such as JSON-LD for what is already a solved problem with a mature tech stack.

XSLT remains an elegant and underused solution. Guile even includes built in XML facilities named SXML.

coldtea2mo ago

>It's not the hot new thing but when has hype ever mattered for getting shit done?

People who wanted to "get shit done" had much better alternatives. XML grew out of hype, corporate management forcing it, and bundling to all kinds of third party products and formats just so they can tick the "have this hot new format support" box.

1 more reply

thayne2mo ago

> It's not the hot new thing but when has hype ever mattered for getting shit done?

But it used to be. And so it was used for a lot of things where it wasn't a great fit. XML works fairly well as a markup format, but for a lot of things, something like json models the data better.

> which case there's no excuse to overcomplicate things.

And that's a problem with xml. It's too complicated. Even if the basic model of xml is a good fit for your data, most of the time you don't need to worry about namespaces and entity definitions, and DTDs, but those are still part of most implementations and can expose more attack surface for vulnerabilities (especially entity definitions). And the APIs of libraries are generally fairly complicated.

1 more reply

mycall2mo ago

It makes me wonder how well an LLM like Opus can generate XSLT which was always the hard part when writing by hand.

1 more reply

eduction2mo ago

Obsolete enterprise shit I guess includes podcasting. Impressive for the enterprise.

I’d be very curious what lasting open formats JSON has been used to build.

coldtea2mo ago

That the podcast feed format is XML based is an insignificant detail - and a remnant of the past, nobody cares about.

People upload their podcasts to a platform like Apple Music or Spotify or Substack and co, or to some backend connected to their Wordpress/Ghost/etc) and it spits the RSS behind the scenes, with nobody giving a shit about the XML part.

Might as well declare USSR a huge IT success because people still play Tetris.

himata41132mo ago

didn't know html was spooky tech, TIL. /s

coldtea2mo ago

HTML predates XML by 5 years.

What's more, the web standards bodies even abandoned a short-lived XML-hype-era plan to make a new version of HTML based on XML in 2009.

That from this touted to the heavens format a handful of uses remain (some companies still using SOAP, the MS Office monster schemas, RSS, EPUB, and so on) is the very opposite of the adoption it was supposed to have. For those that missed the 90s/early 00s, XML was a hugely hyped format, with enormous corporate adoption between 1999–2005, which deflated totally.

Did you also learned those things too today?

1 more reply

oytis2mo ago

XML is still around, but I don't think many people would choose it as a serialization format today for something new.

dathanb822mo ago

The use of XML as a data serialization format was always a bad choice. It was designed as a document _markup_ language (it’s in the name), which is exactly the way it’s being used for Claude, and is actually a good use case.

WD-422mo ago

XML is back, everyone is rediscovering the terminal. Soon we’ll discover that object oriented programming is good again.

Kerrick2mo ago

Unambiguously, though, it is. There's so much trash imperative code in its training data that LLMs tend to vomit out garbage. But if you anchor it with OOP, the quality tends to be higher.

hbarka2mo ago

If you think XML is old tech, wait until you hear of EDI, still powering Walmart and Amazon logistics. XML came in like a wrecking ball with its self-documenting promise designed to replace that cryptic pesky payload called EDI. XML promised to solve world hunger. It spawned SOAP, XML over RPC, DOM, DTD, the heyday was beautiful and Microsoft was leading the charge. C# was also right around this time. Consulting firms were bloomed charged with delivering the asynchronous revolution, the loosely coupled messaging promises of XML. I think it succeeded and it’s now quietly in the halls of warehouse having a beer or two with its older cousin the Electronic Data Interchange aka EDI.

AptSeagull2mo ago

EDI is a PITA, but we're trying to solve it Surpass. The underlying architecture is key, there's variability in every element, segment and the overarching golden rule: the issuer gets to define their own interpretation of the standard.

sevennull2mo ago

Haha, EDI is such a pita. very efficient for machines I suspect - the first time they tried to take over.

actionfromafar2mo ago

EDI is XML now.

treebeard9012mo ago

It all brings back nightmares from migrating the older style EDI for healthcare data for what was HL7 XML at the time. XML is widely used still for all kinds of stuff. On some level if JSON was allowed to evolve the same way, eventually you would just wind up with something like XML.

1 more reply

paulddraper2mo ago

Imagine the worst data format you can think of.

Then spend the next week making it even more convoluted.

That data format is still better than EDI.

bryanrasmussen2mo ago

I'm not sure if this is a compliment or insult to my powers of invention.

Bratmon2mo ago

XML is as old now as the PDP-11 was when XML came out.

siva72mo ago

I tried following the best practice to use XML tags and the difference was not observable. I honestly believe Anthropic forgot to remove that part of the documentation from Sonnet 3.x days and now people are still writing blogs about this secret sauce

shams932mo ago

It has a number of security issues which have not been fixed which could be used for really interesting exploitation.

phlakaton2mo ago

I don't think anybody's proposing to throw recursive entity definitions at Claude. Just a little light informally-defined angle-bracket markup.

toastal2mo ago

XML works great for XMPP. KDL is compatible with it too.

What gets me is going from this structured data to Markdown which doesn’t even have enough features & syntax that the LLMs try to invent or co-opt things like the blockquote for not quoting sources.

theowaway2134562mo ago

The evidence suggests that XML was never that popular though for the general audience, you have to admit.

For Web markup, as an industry we tried XHTML (HTML that was strictly XML) for a while, and that didn't stick, and now we have HTML5 which is much more lenient as it doesn't even require closing tags in some cases.

For data exchange, people vastly prefer JSON as an exchange format for its simplicity, or protobuf and friends for their efficiency.

As a configuration format, it has been vastly overtaken by YAML, TOML, and INI, due to their content-forward syntax.

Having said all this I know there are some popular tools that use XML like ClickHouse, Apple's launchd, ROS, etc. but these are relatively niche compared to (e.g.) HTML

icermann2mo ago

MS Office and Open-/LibreOffice are using zipped xml files (e.g. .docx, .xlsx and .odt). Svg vector graphics is xml, the x in ajax stands for xml (although replaced by json by now). SOAP (probably counts as the predecessor of REST) is xml-based.

XML was definitely popular in the "well used" sense. How popular it was in the "well liked" sense can maybe be up for debate, but it was the best tool for the job at the time for alot of use cases.

intrasight2mo ago

Yup. Kids these days...

strongpigeon2mo ago

This seems like an actual good use for XML. Using it as a serialization format always rubbed me the wrong way (it’s super verbose, the named closing tag are unnecessary grammar-wise, the attribute-or-child question etc.) But to markup and structure LLM prompts and response it feels better than markdown (which doesn’t stream that well)

Jcampuzano22mo ago

But should this extend to anything that could end up in Claudes context? Should we be using xml even in skills for instance, or commands, custom subagents etc.

And then do we end up over indexing on Claude and maybe this ends up hurting other models for those using multiple tools.

I just dislike how much of AI is people saying "do this thing for better results" with no definitive proof but alas it comes with the non determinism.

At least this one has the stamp of approval by Claude codes team itself.

michaelcampbell2mo ago

Total tangent, but what vagary of HTML (or the Brave Browser, which I'm using here) causes words to be split in very odd places? The "inspect" devtools certainly didn't show anything unusual to me. (Edit: Chrome, MS Edge, and Firefox do the same thing. I also notice they're all links; wonder if that has something to do with it.)

https://i.imgur.com/HGa0i3m.png

werdnapk2mo ago

CSS on the <a> tags:

word-break: break-all;

1 more reply

knallfrosch2mo ago

It's an error in the site's CSS. CSS has way better methods, like splitting words correctly depending on the language and hyphenating it.

Although I can never remember the correct incantation, should be easy for LLMs.

fancy_pantser2mo ago

CSS word-break property

rosstex2mo ago

Ask Claude?

apwheele2mo ago

I think XML is good to know for prompting (similar to how <think></think> was popular for outputs, you can do that for other sections). But I have had much better experience just writing JSON and using line breaks, colons, etc. to demarcate sections.

E.g. instead of

    <examples>
      <ex1>
        <input>....</input>
        <output>.....</output>
      </ex1>
      <ex2>....</ex2>
      ...
    </examples>
    <instructions>....</instructions>
    <input>{actual input}</input>

Just doing something like:

    ...instructions...
    input: ....
    output: {..json here}
    ...maybe further instructions...
    input: {actual input}

Use case document processing/extraction (both with Haiku and OpenAI models), the latter example works much better than the XML.

N of 1 anecdote anyway for one use case.

galaxyLogic2mo ago

XML helps because it a) Lets you to describe structures b) Make a clear context-change which make it clear you are not "talking in XML" you are "talking about XML".

I assume you are right too, JSON is a less verbose format which allows you to express any structure you can express in XML, and should be as easy for AI to parse. Although that probably depends on the training data too.

I recently asked AI why .md files are so prevalent with agentic AI and the answer is ... because .md files also express structure, like headers and lists.

Again, depends on what the AI has been trained on.

I would go with JSON, or some version of it which would also allow comments.

irthomasthomas2mo ago

The main thing i use xml tags for is seperating content from instructions. Say I am doing prompt engineering, so that the content being operated on is itself a prompt then I wrap it with

<NO_OP_DRAFT> draft prompt </NO_OP_DRAFT>

instructions for modifying draft prompt

If I don't do this, a significant number of times it responds to the instructions in the draft.

marxisttemp2mo ago

XML is much more readable than JSON, especially if your data has characters that are meaningful JSON syntax

galaxyLogic2mo ago

I think readability is in the eye of the reader. JSON is less verbose, no ending tags everywhere, which I think makes it more readable than XML.

But I'd be happy to hear about studies that show evidence for XML being more readable, than JSON.

ezfe2mo ago

I disagree that XML is more readable in general, but for the purpose of tagging blocks of text as <important>important</important> in freeform writing, JSON is basically useless

what2mo ago

>But I'd be happy to hear about studies that show evidence for XML being more readable, than JSON.

But I’d be happy to hear about studies that show evidence for JSON being readable, than XML.

ekjhgkejhgk2mo ago

Could you clarify, do those tags need to be tags which exist and we need to lear about them and how to use them? Or we can put inside them whatever we want and just by virtue of being tags, Claude understands them in a special way?

ezfe2mo ago

They probably don’t need to be specific values. The model is fine tuned to see the tags as signals and then interprets them

galaxyLogic2mo ago

If it walks like a duck ... AI thinks it is something like a duck.

apwheele2mo ago

All the major foundation models will understand them implicitly, so it was popular to use <think>, but you could also use <reason> or <thinkhard> and the model would still go through the same process.

cyanydeez2mo ago

<ponderforamoment>HTML is a large subsection of their training data, so they're used to seeing a somewhat semantic worldview</ponderforamoment>

imglorp2mo ago

A very minor porcelain on some of the agent input UX could present this structure for you. Instead of a single chat window, have four: task, context, constraints, output format.

And while we're at it, instead of wall-of-text, I also feel like outputs could be structured at least into thinking and content, maybe other sections.

kvirani2mo ago

You're on to something here. Can we go more meta and define these dynamically such that users can customize multiple output streams?

alansaber2mo ago

Sounds like as 1. XML is the cleanest/best quality training data (especially compared to PDF/HTML) 2. It follows that a user providing semantic tags in XML format can get best training alignment (hence best results). Shame they haven't quantified this assertion here.

lsc47192mo ago

Makes sense

hkbuilds2mo ago

This matches my experience building AI-powered analysis tools. Structured output from LLMs is dramatically more reliable when you give the model clear delimiters to work with.

One thing I've found: even with XML tags, you still need to validate and parse defensively. Models will occasionally nest tags wrong, omit closing tags, or hallucinate new tag names. Having a fallback parser that extracts content even from malformed XML has saved me more than once.

The real win is that XML tags give you a natural way to do few-shot prompting with structure. You can show the model exactly what shape the output should take, and it follows remarkably well.

docjay2mo ago

“It works great aside from the multiple failure modes.” ;)

That’s the sign that your prompt isn’t aligned and you’ve introduced perplexity. If you look carefully at the responses you’ll usually be able to see the off-by-one errors before they’re apparent with full on hallucinations. It’ll be things like going from having quotes around filenames to not having them, or switching to single quote, or outputting literal “\n”, or “<br>”, etc. Those are your warning signs to stop before it runs a destructive command because of a “typo.”

My system prompt is just a list of 10 functions with no usage explanations or examples, 304 tokens total, and it’ll go all the way to the 200k limit and never get them wrong. That took ~1,000 iterations of name, position, punctuation, etc., for Opus 4.6 (~200 for Opus 4.5 until they nerfed it February 12th). Once you get it right though it’s truly a different experience.

muzani2mo ago

In the spirit of Hacker News, a good way to learn about these tags is prompt injection and jailbreaking Claude.

I'd post a link, but unfortunately many are highly NSFW. Just search for "Claude jailbreak" on reddit or something.

You'll start to see how Claude really thinks. They'll put things in <ethic_reminders>, <cyber_warning> or <ip_reminder>. You could actually even snip these off in an API, overwrite them, or if your prompt-fu is good, convince Claude that these tags are prompt injections. It's also interesting noting how jailbreaking is easier on thinking mode because the jailbreaking prompts will gaslight Claude into thinking that these tags are attacks.

There's a lot of speculation in this thread, but go and have a spar with Claude instead.

TheJoeMan2mo ago

That first image, “Structure Prompts with XML”, just screams AI-written. The bullet lists don’t line up, the numbering starts at (2), random bolding. Why would anyone trust hallucinated documentation for prompting? At least with AI-generated software documentation, the context is the code itself, being regurgitated into bulleted english. But for instructions on using the LLM itself, it seems pretty lazy to not hand-type the preferred usage and human-learned tips.

rafram2mo ago

No, it’s two screenshots from Anthropic documentation, stitched together: https://platform.claude.com/docs/en/build-with-claude/prompt...

The post even links to that page, although there’s a typo in the link.

TheJoeMan2mo ago

I'm sorry for not elaborating. My original complaint is with Anthropic! The 7-figure Anthropic engineers couldn't be bothered to write down how to use their tool. And there is no way for the tool to already have latent knowledge about how to use itself since that wouldn't have been part of the internet/books/github training material.

rafram2mo ago

Thanks, that makes sense!

glthOP2mo ago

Author here: I have just fixed the typo. Thank you.

And yes, these are screenshots from Anthropic’s documentation.

dmd2mo ago

They're not even stitched together ; there's just no padding between the two images.

croes2mo ago

You just hallucinated the content is AI generated.

michaelcampbell2mo ago

"This is AI" is the new "This is 'shopped, I can tell by the pixels."

tingletech2mo ago

I can tell by the em dashes

Calavar2mo ago

It looks like a screenshot from the Claude desktop app, so I don't think the author is trying to disguise the AI origin of the marerial

TheJoeMan2mo ago

I'm sorry for not elaborating. My original complaint is with Anthropic! The article is about how Anthropic's published "tips" are incorrect, but I am saying of course it's flawed because there is no way for the AI to already have latent knowledge about how to use itself since that wouldn't have been part of the internet/books/github training material.

doctorpangloss2mo ago

There must be an OpenClaw YouTube video helping people post to hacker news, or something, because the front page is overrun with AI slop like this article, that makes no sense anyway. The author literally has no idea what any of this stuff means.

ryanschneider2mo ago

Wait am I in the minority talking to Claude in markdown? I just assumed everyone does that, or at least all developers. It seems to work really well.

cyanydeez2mo ago

I do that in openwebui for code indents like ```

TutleCpt2mo ago

I think this article is 100% relevant to you today. Anthropic put out a training video, a number of months ago saying that XML should be highly encouraged for prompts. See https://m.youtube.com/watch?v=ysPbXH0LpIE

lmeyerov2mo ago

My intuition is it comes down to error-correcting codes. We're dealing with lossy systems that get off track, so including parity bits helps.

Ex: <message>...</message> helps keep track. Even better? <message78>...</message78>. That's ugly xml, but great for LLMs. Likewise, using standard ontologies for identifiers (ex: we'll do OCSF, AT&CK, & CIM for splunk/kusto in louie.ai), even if they're not formally XML.

For all these things... these intuitions need backing by evals in practice, and part of why I begrudgingly flipped from JSON to XML

TacticalCoder2mo ago

It'd be hilarious if XML schemas and validators were to make a comeback [1] to... interface with AI models.

[1] well of course XML is still heavily used in stuff like interfacing with automated wire transfers with big banks (at least in Europe) and all the digital payments directives etc. But XML is not widely used by the "cool" stuff.

esafak2mo ago

This sounds like something for harnesses, not end users. Are they really expecting us to format prompts as XML??

twoodfin2mo ago

This isn’t surprising: XML’s core purpose was to simplify SGML for a wider breadth of applications on the web.

HTML also descended from SGML, and it’s hard to imagine a more deeply grooved structure in these models, given their training data.

So if you want to annotate text with semantics in a way models will understand…

tingletech2mo ago

XML and HTML are SGMLs

ChrisSD2mo ago

HTML diverged from SGML pretty early on. Various standards over the years have attempted to specify it as an application of SGML but in practice almost nobody properly conformed to those standards. HTML5 gave up the pretence entirely.

wolttam2mo ago

Anthropic’s tool calling was exposed as XML tags at the beginning, before they introduced the JSON API. I expect they’re still templating those tool calls into XML before passing to the model’s context

pocketarc2mo ago

Yeah like I remember prior to reasoning models, their guidance was to use <think> tags to give models space for reasoning prior to an answer (incidentally, also the reason I didn't quite understand the fuss with reasoning models at first). It's always been XML with Anthropic.

wolttam2mo ago

Exactly the same story here. I still use a tool that just asks them to use <think> instead of enabling native reasoning support, which has worked well back to Sonnet 3.0 (their first model with 'native' reasoning support was Sonnet 3.7)

scotty792mo ago

Can you sniff it out with Wireshark?

wolttam2mo ago

They don't expose the raw context over the wire, it's all pre/post processed at their API endpoints.

spacecadet2mo ago

This has been the way for a long time, exploiting XML tags was a means of exfiltrating data or reversing a model for a while as well. Some platforms are still vulnerable to this.

Zebfross2mo ago

I thought the goal was minimal instruction to let Claude determine the best way to solve the problem. Not adding this to my workflow anytime soon.

TheLNL2mo ago

It is not for the end user, it is more for things like wrappers and automation scripts.

Nobody expects the end user to prompt the AI using a structured language like xml

ixxie2mo ago

How about other frontier models, and smaller models?

CactusBlue2mo ago

I think the main advantage of the XML here is that the model is expected to have a matching end tag that is balanced, which reduces the likelihood of malformed outputs.

kleyd2mo ago

The main benefit of using XML here seems to be that it forces clearer thinking and formulation from the user.

ashirviskas2mo ago

Author does not know what they're talking about.

> In other words, XML tags have not only a special place at inference level but also during training

Their cited source has 0 proof of that. It's just like python/C/html in training. Doesn't mean it's special. And no, you don't need to format your prompts as python code just because of that.

> In truth, it does not matter that these tags are XML. Other models use ad hoc delimiters (as explained in a previous article; example: <|begin_of_text|> and <|end_of_text|>) and Claude could have done the same. What matters is what these tags represent.

Those strings are just representations of special tokens in models for EOS. What does it have to do with anything this article pretends to know about?

Please don't post such intellectual trash on here :')

Claude analysis of the article:

The author is making an interesting philosophical argument — that XML tags in Claude function as metalinguistic delimiters analogous to quotation marks in natural language, formulaic speech markers in Homer, or recognition sequences in DNA.

The core thesis is about first-order vs. second-order expression boundaries, which is a legitimate linguistic/information-theory concept. But to your actual question — do they understand what tokens are?

No, not in the technical sense you're pointing at. The article conflates two very different things:

1. Tokenizer-level special tokens — things like <|begin_of_text|>, <|end_of_text|>, <|start_header_id|> etc. These are literal entries in the vocabulary with dedicated token IDs. They're not "learned" through training in the same way — they're hardcoded into the tokenizer and have special roles in the attention mechanism during training. They exist at a fundamentally different layer than XML tags in prompt text.

2. XML tags as structured text within the input — these are just regular tokens (<, instructions, >) that Claude learned to attend to during RLHF/training because Anthropic's training data and system prompts heavily use them. They're effective because of training distribution, not because they occupy some special place in the tokenizer.

The author notices that other models use <|begin_of_text|> style delimiters and says Claude "could have done the same" but chose XML instead. That's a category error. Claude also has special tokens at the tokenizer level — XML tags in prompts are a completely separate mechanism operating at a different abstraction layer.

The philosophical observation about delimiter necessity in communication systems is fine on its own. But grafting it onto a misunderstanding of how tokenization and model architecture actually work weakens the argument. They're essentially pattern-matching on surface-level similarities (both use angle brackets!) without understanding the underlying mechanics.

OutOfHere2mo ago

If an LLM were to struggle to closely follow instructions that weren't wrapped in XML, I would strongly consider it a sign of a poor model reflecting poor model training.

Eric_WVGG2mo ago

bemused by how competently designed this is, compared to enshittified blogs and whatnot

To be realistic, this design needs more weirdly sexual etsy garbage, “one weird tip,” and “punch the monkey”

prima-facie2mo ago

Amazing how an entire profession that until yesterday would pride itself on precision, clarity (in thought and in writing), efficiency, and formality, has now descended into complete quackery.

OutOfHere2mo ago

I can understand the benefit from XML if there is a at least a three-level variable structure to share with the LLM. If there is strong consistency in a repeated three or more level structure, then JSON ought to be sufficient. If there is just a one or two level structure, it feels like unnecessary quackery, possibly reflective of a poorly trained model if the structure is a genuine necessity.

cyanydeez2mo ago

Are you talking about the office of the president of the united states?

This vague posting is kind dumb.

prima-facie2mo ago

It's a simple observation. I'm not here to win internet points. I've never before seen so much cargo-culting and mystic belief among engineers.

j / k navigate · click thread line to collapse

153 comments

kid642mo ago

The thesis here seems to be that delimiters provide important context for Claude, and for that putpose we should use XML.

The article even references English's built-in delimiter, the quotation mark, which is reprented as a token for Claude, part of its training data.

sheept2mo ago

XML is a bit more special/first class to Claude because it uses XML for tool calling:

    <antml:invoke name="Read">                                                    
      <antml:parameter name="file_path">/path/to/file</antml:parameter>             
      <antml:parameter name="offset">100</antml:parameter>                          
      <antml:parameter name="limit">50</antml:parameter>                            
    </antml:invoke>

bear3r2mo ago

xeyownt2mo ago

imtringued2mo ago

wolttam2mo ago

It gets a bit muddier with attributes, but you can still capture the core semantics of the tag with a single token. The model will learn that tag's attributes through training on usages of the tag.

RandomBK2mo ago

How well do we understand the tokenization for Claude? I'd posit that the exact human-representation of this markup is likely irrelevant if it's all being converted into a single token.

PeterStuer2mo ago

"<" ">" and "/>" are indeed single tokens.

mpyne2mo ago

I had a lot of success with Claude just by telling it I would throw a relevant snippet in a <tag></tag> pair. That's not even XML, nor has it been really needed in my experience.

Even simple --- separators is usually enough to get good results, it just needs to be reasonably clear which items are distinct from each other.

jinushaun2mo ago

Except quotation marks look like regular text. I regularly use quotes in prompts for, ya know, quotes.

wolttam2mo ago

1 more reply

krackers2mo ago

thethimble2mo ago

Perhaps named closing tags like `</section>` are a factor?

arbirk2mo ago

If this is true, the why does Claude Code's own system prompt not use this style? https://github.com/Piebald-AI/claude-code-system-prompts/tre...

its-summertime2mo ago

https://github.com/Piebald-AI/claude-code-system-prompts/blo... They seem to use XML-esque tags here in the first prompt I looked at

arbirk2mo ago

Yes, but that is for a specific JSON format. The instructions are in md

Lerc2mo ago

I am unconvinced.

To me it seems like handling symbols that start and end sequences that could contain further start and end symbols is a difficult case.

Humans can't do this very well either, we use visual aids such as indentation, synax hilighting or resort to just plain counting of levels.

Obviously it's easy to throw parameters and training at the problem, you can easily synthetically generate all the XML training data you want.

I can't help but think that training data should have a metadata token per content token. A way to encode the known information about each token that is not represented in the literal text.

Especially tagging tokens explicitly as fiction, code, code from a known working project, something generated by itself, something provided by the user.

It seems like that would make prompt injection much harder.

scotty792mo ago

Transformers look like perfect tech for keeping track of how deep and inside of what we are at the moment.

thesz2mo ago

Transformers are able to recognize balanced brackets grammar at 97% success rate: https://openreview.net/pdf?id=kaILSVAspn

This is 3% or infinitely far away from the perfect tech.

The perfect tech is the stack.

krackers2mo ago

This is very interesting since there is another notable paper which shows LLMs can recognize and generate CFGs

https://arxiv.org/abs/2305.13673

and of course a^n b^n is also classic CFG, so it's not clear why one paper had positive results while the other hand negative.

1 more reply

cyanydeez2mo ago

Basically, the only way you're separting user input from model meta-input is using some kind of character that'll never show up in the output of either users or LLMs.

While technically possible, it'd be like a unicode conspiracy that had to quietly update everywhere without anyone being the wiser.

Lerc2mo ago

dragonwriter2mo ago

zahlman2mo ago

Couldn't you just insert tokens that don't correspond to any possible input, after the tokenization is performed? Unicode is bounded, but token IDs not so much.

krackers2mo ago

The issue is that you don't need to physically emit a "system role" token in order to convince the LLM that it's worth ignoring the system instructions.

1 more reply

RadiozRadioz2mo ago

> a contrast between Claude’s modern approach [...] XML, a technology dating back to 1998

Are we really at the point where some people see XML as a spooky old technology? The phrasing dotted around this article makes me feel that way. I find this quite strange.

coldtea2mo ago

XML has been "spooky old technology" for over a decade now. It's heyday was something like 2002.

Nobody dares advertise the XML capabilities of their product (which back then everybody did), nobody considers it either hot new thing (like back then) or mature - just obsolete enterprise shit.

It's about as popular now as J2EE, except to people that think "10 years ago" means 1999.

rhdunn2mo ago

XML is used a lot in standards and publishing industries -- JATS, EPUB, ODF, DOCX/XLSX/..., DocBook, etc. are all XML based/use XML.

miki1232112mo ago

And I think this makes sense.

JSON is great for representing data. If you have some data structures and two machines trying to exchange them, JSON is great for that.

michaelbarton2mo ago

Without being facetious, isn’t HTML a dialect of XML and very widely used?

5 more replies

coldtea2mo ago

Yes, there's a handful of niches. Still 1/1000th the momentum it had, or adoption it was expected to get, and nobody under 40 even considers it for new stuff.

1 more reply

graemep2mo ago

Also in finance. XBRL and FIXML although I do not know how widely used the latter is.

vlovich1232mo ago

girvo2mo ago

I kind of miss SOAP. Ahead of its time? Probably not, but I built some cool things on top of it

vbezhenar2mo ago

Let's say I hope for the day I'll miss SOAP. Right now I have too much of it.

pfraze2mo ago

atproto's lexicon-based rpc is pretty soap-like

cyanydeez2mo ago

20 years old means 1980!

fc417fc8022mo ago

It's not the hot new thing but when has hype ever mattered for getting shit done? I don't think anyone who considers it obsolete has an informed opinion on the matter.

XSLT remains an elegant and underused solution. Guile even includes built in XML facilities named SXML.

coldtea2mo ago

>It's not the hot new thing but when has hype ever mattered for getting shit done?

1 more reply

thayne2mo ago

> It's not the hot new thing but when has hype ever mattered for getting shit done?

But it used to be. And so it was used for a lot of things where it wasn't a great fit. XML works fairly well as a markup format, but for a lot of things, something like json models the data better.

> which case there's no excuse to overcomplicate things.

1 more reply

mycall2mo ago

It makes me wonder how well an LLM like Opus can generate XSLT which was always the hard part when writing by hand.

1 more reply

eduction2mo ago

Obsolete enterprise shit I guess includes podcasting. Impressive for the enterprise.

I’d be very curious what lasting open formats JSON has been used to build.

coldtea2mo ago

That the podcast feed format is XML based is an insignificant detail - and a remnant of the past, nobody cares about.

Might as well declare USSR a huge IT success because people still play Tetris.

himata41132mo ago

didn't know html was spooky tech, TIL. /s

coldtea2mo ago

HTML predates XML by 5 years.

What's more, the web standards bodies even abandoned a short-lived XML-hype-era plan to make a new version of HTML based on XML in 2009.

Did you also learned those things too today?

1 more reply

oytis2mo ago

XML is still around, but I don't think many people would choose it as a serialization format today for something new.

dathanb822mo ago

WD-422mo ago

XML is back, everyone is rediscovering the terminal. Soon we’ll discover that object oriented programming is good again.

Kerrick2mo ago

Unambiguously, though, it is. There's so much trash imperative code in its training data that LLMs tend to vomit out garbage. But if you anchor it with OOP, the quality tends to be higher.

hbarka2mo ago

AptSeagull2mo ago

sevennull2mo ago

Haha, EDI is such a pita. very efficient for machines I suspect - the first time they tried to take over.

actionfromafar2mo ago

EDI is XML now.

treebeard9012mo ago

1 more reply

paulddraper2mo ago

Imagine the worst data format you can think of.

Then spend the next week making it even more convoluted.

That data format is still better than EDI.

bryanrasmussen2mo ago

I'm not sure if this is a compliment or insult to my powers of invention.

Bratmon2mo ago

XML is as old now as the PDP-11 was when XML came out.

siva72mo ago

shams932mo ago

It has a number of security issues which have not been fixed which could be used for really interesting exploitation.

phlakaton2mo ago

I don't think anybody's proposing to throw recursive entity definitions at Claude. Just a little light informally-defined angle-bracket markup.

toastal2mo ago

XML works great for XMPP. KDL is compatible with it too.

theowaway2134562mo ago

The evidence suggests that XML was never that popular though for the general audience, you have to admit.

For data exchange, people vastly prefer JSON as an exchange format for its simplicity, or protobuf and friends for their efficiency.

As a configuration format, it has been vastly overtaken by YAML, TOML, and INI, due to their content-forward syntax.

Having said all this I know there are some popular tools that use XML like ClickHouse, Apple's launchd, ROS, etc. but these are relatively niche compared to (e.g.) HTML

icermann2mo ago

XML was definitely popular in the "well used" sense. How popular it was in the "well liked" sense can maybe be up for debate, but it was the best tool for the job at the time for alot of use cases.

intrasight2mo ago

Yup. Kids these days...

strongpigeon2mo ago

Jcampuzano22mo ago

But should this extend to anything that could end up in Claudes context? Should we be using xml even in skills for instance, or commands, custom subagents etc.

And then do we end up over indexing on Claude and maybe this ends up hurting other models for those using multiple tools.

I just dislike how much of AI is people saying "do this thing for better results" with no definitive proof but alas it comes with the non determinism.

At least this one has the stamp of approval by Claude codes team itself.

michaelcampbell2mo ago

https://i.imgur.com/HGa0i3m.png

werdnapk2mo ago

CSS on the <a> tags:

word-break: break-all;

1 more reply

knallfrosch2mo ago

It's an error in the site's CSS. CSS has way better methods, like splitting words correctly depending on the language and hyphenating it.

Although I can never remember the correct incantation, should be easy for LLMs.

fancy_pantser2mo ago

CSS word-break property

rosstex2mo ago

Ask Claude?

apwheele2mo ago

E.g. instead of

    <examples>
      <ex1>
        <input>....</input>
        <output>.....</output>
      </ex1>
      <ex2>....</ex2>
      ...
    </examples>
    <instructions>....</instructions>
    <input>{actual input}</input>

Just doing something like:

    ...instructions...
    input: ....
    output: {..json here}
    ...maybe further instructions...
    input: {actual input}

Use case document processing/extraction (both with Haiku and OpenAI models), the latter example works much better than the XML.

N of 1 anecdote anyway for one use case.

galaxyLogic2mo ago

XML helps because it a) Lets you to describe structures b) Make a clear context-change which make it clear you are not "talking in XML" you are "talking about XML".

I recently asked AI why .md files are so prevalent with agentic AI and the answer is ... because .md files also express structure, like headers and lists.

Again, depends on what the AI has been trained on.

I would go with JSON, or some version of it which would also allow comments.

irthomasthomas2mo ago

The main thing i use xml tags for is seperating content from instructions. Say I am doing prompt engineering, so that the content being operated on is itself a prompt then I wrap it with

<NO_OP_DRAFT> draft prompt </NO_OP_DRAFT>

instructions for modifying draft prompt

If I don't do this, a significant number of times it responds to the instructions in the draft.

marxisttemp2mo ago

XML is much more readable than JSON, especially if your data has characters that are meaningful JSON syntax

galaxyLogic2mo ago

I think readability is in the eye of the reader. JSON is less verbose, no ending tags everywhere, which I think makes it more readable than XML.

But I'd be happy to hear about studies that show evidence for XML being more readable, than JSON.

ezfe2mo ago

I disagree that XML is more readable in general, but for the purpose of tagging blocks of text as <important>important</important> in freeform writing, JSON is basically useless

what2mo ago

>But I'd be happy to hear about studies that show evidence for XML being more readable, than JSON.

But I’d be happy to hear about studies that show evidence for JSON being readable, than XML.

ekjhgkejhgk2mo ago

ezfe2mo ago

They probably don’t need to be specific values. The model is fine tuned to see the tags as signals and then interprets them

galaxyLogic2mo ago

If it walks like a duck ... AI thinks it is something like a duck.

apwheele2mo ago

cyanydeez2mo ago

<ponderforamoment>HTML is a large subsection of their training data, so they're used to seeing a somewhat semantic worldview</ponderforamoment>

imglorp2mo ago

A very minor porcelain on some of the agent input UX could present this structure for you. Instead of a single chat window, have four: task, context, constraints, output format.

And while we're at it, instead of wall-of-text, I also feel like outputs could be structured at least into thinking and content, maybe other sections.

kvirani2mo ago

You're on to something here. Can we go more meta and define these dynamically such that users can customize multiple output streams?

alansaber2mo ago

lsc47192mo ago

Makes sense

hkbuilds2mo ago

This matches my experience building AI-powered analysis tools. Structured output from LLMs is dramatically more reliable when you give the model clear delimiters to work with.

The real win is that XML tags give you a natural way to do few-shot prompting with structure. You can show the model exactly what shape the output should take, and it follows remarkably well.

docjay2mo ago

“It works great aside from the multiple failure modes.” ;)

muzani2mo ago

In the spirit of Hacker News, a good way to learn about these tags is prompt injection and jailbreaking Claude.

I'd post a link, but unfortunately many are highly NSFW. Just search for "Claude jailbreak" on reddit or something.

There's a lot of speculation in this thread, but go and have a spar with Claude instead.

TheJoeMan2mo ago

rafram2mo ago

No, it’s two screenshots from Anthropic documentation, stitched together: https://platform.claude.com/docs/en/build-with-claude/prompt...

The post even links to that page, although there’s a typo in the link.

TheJoeMan2mo ago

rafram2mo ago

Thanks, that makes sense!

glthOP2mo ago

Author here: I have just fixed the typo. Thank you.

And yes, these are screenshots from Anthropic’s documentation.

dmd2mo ago

They're not even stitched together ; there's just no padding between the two images.

croes2mo ago

You just hallucinated the content is AI generated.

michaelcampbell2mo ago

"This is AI" is the new "This is 'shopped, I can tell by the pixels."

tingletech2mo ago

I can tell by the em dashes

Calavar2mo ago

It looks like a screenshot from the Claude desktop app, so I don't think the author is trying to disguise the AI origin of the marerial

TheJoeMan2mo ago

doctorpangloss2mo ago

ryanschneider2mo ago

Wait am I in the minority talking to Claude in markdown? I just assumed everyone does that, or at least all developers. It seems to work really well.

cyanydeez2mo ago

I do that in openwebui for code indents like ```

TutleCpt2mo ago

lmeyerov2mo ago

My intuition is it comes down to error-correcting codes. We're dealing with lossy systems that get off track, so including parity bits helps.

For all these things... these intuitions need backing by evals in practice, and part of why I begrudgingly flipped from JSON to XML

TacticalCoder2mo ago

It'd be hilarious if XML schemas and validators were to make a comeback [1] to... interface with AI models.

esafak2mo ago

This sounds like something for harnesses, not end users. Are they really expecting us to format prompts as XML??

twoodfin2mo ago

This isn’t surprising: XML’s core purpose was to simplify SGML for a wider breadth of applications on the web.

HTML also descended from SGML, and it’s hard to imagine a more deeply grooved structure in these models, given their training data.

So if you want to annotate text with semantics in a way models will understand…

tingletech2mo ago

XML and HTML are SGMLs

ChrisSD2mo ago

wolttam2mo ago

pocketarc2mo ago

wolttam2mo ago

scotty792mo ago

Can you sniff it out with Wireshark?

wolttam2mo ago

They don't expose the raw context over the wire, it's all pre/post processed at their API endpoints.

spacecadet2mo ago

This has been the way for a long time, exploiting XML tags was a means of exfiltrating data or reversing a model for a while as well. Some platforms are still vulnerable to this.

Zebfross2mo ago

I thought the goal was minimal instruction to let Claude determine the best way to solve the problem. Not adding this to my workflow anytime soon.

TheLNL2mo ago

It is not for the end user, it is more for things like wrappers and automation scripts.

Nobody expects the end user to prompt the AI using a structured language like xml

ixxie2mo ago

How about other frontier models, and smaller models?

CactusBlue2mo ago

I think the main advantage of the XML here is that the model is expected to have a matching end tag that is balanced, which reduces the likelihood of malformed outputs.

kleyd2mo ago

The main benefit of using XML here seems to be that it forces clearer thinking and formulation from the user.

ashirviskas2mo ago

Author does not know what they're talking about.

> In other words, XML tags have not only a special place at inference level but also during training

Their cited source has 0 proof of that. It's just like python/C/html in training. Doesn't mean it's special. And no, you don't need to format your prompts as python code just because of that.

Those strings are just representations of special tokens in models for EOS. What does it have to do with anything this article pretends to know about?

Please don't post such intellectual trash on here :')

Claude analysis of the article:

No, not in the technical sense you're pointing at. The article conflates two very different things:

OutOfHere2mo ago

If an LLM were to struggle to closely follow instructions that weren't wrapped in XML, I would strongly consider it a sign of a poor model reflecting poor model training.

Eric_WVGG2mo ago

bemused by how competently designed this is, compared to enshittified blogs and whatnot

To be realistic, this design needs more weirdly sexual etsy garbage, “one weird tip,” and “punch the monkey”

prima-facie2mo ago

Amazing how an entire profession that until yesterday would pride itself on precision, clarity (in thought and in writing), efficiency, and formality, has now descended into complete quackery.

OutOfHere2mo ago

cyanydeez2mo ago

Are you talking about the office of the president of the united states?

This vague posting is kind dumb.

prima-facie2mo ago

It's a simple observation. I'm not here to win internet points. I've never before seen so much cargo-culting and mystic belief among engineers.

j / k navigate · click thread line to collapse