It's much simpler to sign the entire message, unparsed, and it's immune to these issues.
We went through a decade of debate before deciding that "encrypt then mac" is the only right way to do things. That knowledge hasn't trickled down to other domains.
To very lossily summarize: always authenticate before looking at the message.
Its a handy rule of thumb when you're making choices like how to validate a message.
https://moxie.org/2011/12/13/the-cryptographic-doom-principl....
But the issue is: those standards are out there and they're used and probably some people will use it in new projects and you have to interoperate with them.
So yeah, don't use those standards when you can, but sometimes you have to.
Kill it with fire. This stuff's broken. We know better than to do things this way no. Just no. You sign binary blobs. Signature check fails, your binary blob is garbage and never gets parsed. End of story.
(Mental note: never deploy SAML anywhere)
Aside: I've seen a credit card processor implement nonsense like this, where I had to parse XML with regular expressions to extract the to-be-signed segment, because it was never going to round trip through a typical XML parser. But then again, this was only about the 25th batshit insane and likely insecure thing they were doing, just like every other banking related company, so shrug.
Could you explain more about this? I thought the whole point of ASN.1 DER was to have only one canonical representation for a given structured value, and that the signing was done as-is on the sequence of bytes directly. It definitely doesn't have the same problems as XML and other text-based formats.
People need to stop using SAML. This needs to be a priority. A little background, for those who haven't had the displeasure of working with it:
When a user wants to log into an application (the "Service Provider"), and is required to SSO against an "Identity Provider", the Identity Provider basically generates an XML document with information about the user, then signs that document using a thing known as an XML Digital Signature, or XMLDSIG.
When you think of "signing" a document, normally you would serialize that document out to bytes, apply your signature scheme over the bytes, then send along both the bytes and the signature. But for reasons which are irrelevant to modern implementations, XMLDSIG prefers to stuff the signature metadata back inside the XML document that was just signed. Obviously this invalidates the signature, so you also inject some metadata instructing receivers on how to put the document back how it was. There are several algorithms available for this. Then you ship around that XML document. Basically means that when the Identity Provider receives one of these documents it needs to:
1. Parse the XML document (which cannot yet be trusted)
2. Find the signature inside the document
3. Find the metadata about what algorithm(s) to use to restore the document
4. Run the document through whatever transforms are described in that metadata (keep in mind that up to this point the document might well have been supplied by an attacker)
5. Serialize the transformed document back out to bytes, being careful not to touch any whitespace, etc
6. Verify the signature over the re-serialized document
If all of this succeeds and was implemented perfectly, you can trust the output of step 5. Ideally you should re-parse it. A common failure mode is trusting the original input instead, so be careful about that.Obviously this is a crazy approach to one of the most security-critical parts of an application on the internet, and it breaks all the time.
Unfortunately people persist in using this fundamentally broken protocol, so huge thank you to the team at Mattermost for their research in this area.
Adding better support for namespaces and providing APIs compatible with dsig doesn't remove the underlying vulnerabilities.
There's something called SCIM, "System for Cross-domain Identity Management", that does this, and which you can use together with OpenID Connect (OIDC).
SCIM can automatically deactivate a user account, if the person leaves the organization or moves to a different department. And can auto add and remove him/her to/from various user groups.
But with SAML, managers / admins still need to micro manage the user accounts, e.g. place the user in the correct group, if s/he gets a new job role. SAML only syncs user accounts upon login, from what I've understood. (So if the user stays logged in, then, with SAML, his/her account permissions can get out-of-date?)
SCIM: https://docs.microsoft.com/en-us/azure/active-directory/app-...
Azure AD uses this, and Okta, OneLogin, Github and some others too I suppose.
If anyone has tried SCIM it'd be interesting to hear what you think about it? (I've just read about it)
Many big companies run on SAML, and expect to auth with vendors over SAML. That's why russell_h's comment is probably futile; it's the enterprises with the big SaaS budgets that keep SAML relevant, and they don't care if HN doesn't like it.
Maybe in about a decade SAML will be less important to enterprises? SAML 2.0 is only about 15 years old.
It defines an authentication protocol on top of OAuth2, and is a different beast from the older OpenID standards.
E.g. we did not stop using TLS when TLS 1.0 proved to have problems; we updated the cryptography and kept using the logic.
But the problem described in the post wasn't the encryption. It was the logic. Specifically the order that things are done in. Parsing something before verifying it can be dangerous.
Out of curiosity, what are those reasons?
When in 1996-98 W3C/The SGML Extended Review Board subset XML from SGML to define a generic markup convention for use with the expected wealth of upcoming vocabularies on the web, the issue of name collisions between elements (and attributes) from different vocabularies was deemed significant. Of course, in hindsight, with only SVG and MathML (and rarely HTML 5 in XHTML serialization) left on the web and having been incorporated as foreign elements directly into HTML, this seems overkill (even though there are actually collisions between eg. the title element in SVG vs HTML).
There's an alternative (and saner IMHO) approach for dealing with XML namespaces in ISO/IEC 19757-9 [2] by just presenting a canonical (ie. always the same) namespace prefix as part of an element name by a parser API to an app, guided by processing instructions for binding canonical namespace prefixes to namespace URLs, which might also help enterprise-y XML with lots of XML Schema use. Of course, this doesn't help with roundtripping xmlns-bindings (eg. with their exact ordering, possible redundancy, temporary/insignificant namespace prefixes, re-binding in document fragments etc.) through DOM representations, which seems the problem here.
[1]: https://blog.jclark.com/2010/01/xml-namespaces.html
[2]: https://www.iso.org/obp/ui/#iso:std:iso-iec:19757:-9:ed-1:v1...
Well, that is not something you want to see in a public disclosure.
An XML library doesn't have to support cryptographic security - it just has to perform XML en/decoding effectively. How can it be a mistake for a project to rely on part of the standard library?
>By Mattermost’s estimates this new API will not be a reasonable solution for most use cases currently affected by the vulnerabilities. Parsing and resolving namespaces is an essential requirement for correctly implementing SAML, and even considering only a limited set of real-world SAML messages without strict namespacing requirements would be unlikely to allow for a secure implementation.
A large part of this stems from how complicated XML can get - if it were only elements and attributes it might have been fine. Namespaces made it a bit more complicated. Processing Instructions made it hideous.
The most recent "fun" I had was that on a Citrix NetScaler, if you enable a certain n-Factor workflow, it sends a SAML request to the IdP that Microsoft products only reject as "invalid XML".
From what I can gather the XML being sent is perfectly valid. The issue must be something hideously subtle, like the white space or UTF-8 encoding being subtly different that is upsetting the Microsoft SAML implementations, but not any others.
Have a look at some SAML XML examples online: https://www.samltool.com/generic_sso_res.php
They're hideous not because they're XML, but because they're bad XML! The SAML standard defines its own "namespace attributes" separately but on top of the XML namespaces!
Similarly, instead of the straightforward way to encode the data:
<tag prop="attr">value</tag>
They abstract one level up unnecessarily: <element name="tag">
<attribute name="prop">attr</attribute>
<content>value</content>
</element>
This is the same mistake people make in database schema design, where they'll have a table with columns called "Key", "ColumnName", and "ColumnValue".>Security. A security issue in the specification or implementation may come to light whose resolution requires breaking compatibility. We reserve the right to address such security issues.
If the go team decides that this issue is worth a breaking change is another question entirely.
I expect that a similar problem will be found in many other libraries, if the XML was publicized. XML namespaces made a critical... "mistake" is probably too strong, but "design choice that deviated too far from people's mental model" is about right... that has prevented them from being anywhere near as useful or safe as they could be. In an XML document using XML namespaces, "ns1:tagname" may not equal "ns1:tagname", and "ns1:tagname" can be equal to "ns2:tagname". This breaks people's mental models of how XML works, and correspondingly, breaks people's code that manipulates XML.
(I actually used the Go XML library as an SVG validator in the ~1.8 timeframe and had to fork it to fix namespaces well enough to serve in that role. I didn't know about how to exploit it in a specific XML protocol but I've know about the issues for a while. "Why didn't you upstream it then?" Well, as this security bulletin implies, the data structures in encoding/xml are fundamentally wrong for namespaced XML to be round-tripped and there is no backwards-compatible solution to the problem, so it was obvious to me without even trying that it would be rejected. This has also been discussed on a number of tickets subsequently over the years, so that XML namespace handling is weak in the standard library is not news to the Go developers. Note also that it's "round-tripping" that is the problem; if you parse & consume you can write correct code, it's the sending it back out that can be problematic.)
Namespaces fundamentally rewrite the nature of XML tag and attribute names. No longer are they just strings; now they are tuples of the form (namespace URL, tag name)... and namespace URL is NOT the prefix that shows up before the colon! The prefix is an abbreviation of an earlier tag declaration. So in the XML
<tag xmlns="https://sample.com/1" xmlns:example1="https://blah.org/1">
<example1:tag xmlns:example2="https://blah.org/2">
<example2:tag xmlns:example1="https://anewsite.com/xmlns">
<example1:tag />
</example2:tag>
</example1:tag>
</tag>
not a SINGLE ONE of those "tag"s is the same! They are, respectively, actually (https://sample.com/1, tag), (https://blah.org/1, tag), (https://blah.org/2, tag), and (https://anewsite.com/xmlns, tag). There's a ton of code, and indeed, even quite a few standards, that will get that wrong. (Note the redefinition of 'example1' in there; that is perfectly legal.) Even more excitingly, <tag xmlns="https://sample.com/1" xmlns:example1="https://sample.com/1">
<example1:tag/>
<example2:tag xmlns:example2="https://sample.com/1" />
</tag>
ARE all the exact tag and should be treated as such, despite the different "tag names" appearing.Reserializing these can be exciting, because A: Your XML library, in principle, ought to be presenting you the (XMLNS, tagname) tuple with the abbreviation stripped away, to discourage you from paying too much attention to the abbreviation but B: humans in general and a lot of code expect the namespace abbreviations to stay the same in a round trip, and may even standardize on what the abbreviations should be. There's a LOT of code out there in the world looking for "'p' or 'xhtml:p'" as the tag name and not ("http://www.w3.org/1999/xhtml", "p").
In general, to maintain roundtrip equality, you have to either A: maintain a table of the abbreviations you see, when they were introduced, and also which was used or B: just use the (XMLNS, tagname) and ensure that while outputing that the relevant namespaces have always been declared. Generally for me I go for option B as it's generally easier to get correct and I pair it with a table of the most common namespaces for what I'm working in, so that, for example, XHTML gets a hard-coded "xhtml:" prefix. It is very easy if you try to implement A to screw it up in a way that can corrupt the namespaces on some input.
(Option B has its own pathologies. Consider:
<tag xmlns:sample="https://example.com/1">
<sample:tag1 />
<sample:tag2 />
</tag>
It's really easy to write code that will drop the xmlns specification on all of the children of "tag", since it didn't use it there, and if your code throws away where the XMLNS was declared and just looks to whether the NS is currently declared, it'll see a new declaration of the "sample" namespace on every usage. Technically correct if the downstream code handles namespaces correctly (big if!), but visually unappealing.)Not defending Go here, except inasmuch as it's such a common error to make that I have a hard time naming libraries and standards that get namespaces completely correct, for as simple as they are in principle. (I think SVG and XHTML have it right. XMPP is very, very close, but still has a few places where the "stream" tag is placed in different namespaces and you're just supposed to know to handle it the same in all the namespaces it appears it... which most people do only because it doesn't occur to them that technically these are separate tags, so it all kinda works out in the end.... libxml2 is correct but I've seen a lot of things that build on top of it and they almost all screw up namespaces.)
That right there is why I like Clark's notation (despite its unholy verbosity), which I learned of because that's how ElementTree manipulates namespaces: in Clark's notation, the document is conceptually
<{https://sample.com/1}tag>
<{https://blah.org/1}tag>
<{https://blah.org/2}tag>
<{https://anewsite.com/xmlns}tag />
</{https://blah.org/2}tag>
</{https://blah.org/1}tag>
</{https://sample.com/1}tag>
Which is unambiguous. But as you note adds challenges for round-trip equality (in fact ElementTree doesn't maintain that, it simply discards the namespace prefixes on parsing, which I have seen outright break supposedly XML parsers which were really hard-coded for specific namespace prefixes).lxml does round-trip prefixes (though it still doesn't round-trip documents) by including a namespace map on each element.
And if you try to combine namespaces with DTDs (which is just an explosive mix to start with, and I think is just recommended to never do) you get other problems, because you're no longer allowed to add arbitrary namespace declarations in the middle, so anything that round-trips prefixes but might ever add redundant declarations of them won't reliably produce something that DTD-validates, and if you're transforming into a DTD from something that might have used other namespaces, you have to make sure to remove all the extra declarations, and…
Note that most of this is still “well-defined”, it's just awkwardly hairy. This is not to be taken as an excuse to implement the standard badly or incorrectly if you're going to handle it at all.
(I originally wrote this comment with a more fully worked-out example, but after viewing it in context I realized it was way too long to be an only-partly-on-topic comment on this thread, so I'll probably move it to a post elsewhere and submit it later.)
Because they usually have incorrect mental model. Blaming namespaces for name ambiguity would be the same as blaming the code "x = a + b" because "a" and "b" could be defined differently.
Namespace prefixes are absolutely irrelevant, they only exists for your convenience.
There's a similar problem with XML entity references, which have been happily breaking enterprise security for over a decade, because nobody has a good mental model of how entities in XML documents actually behave.
It seems fair at this point to blame the standard.
This is false. As soon as you need XML canonicalization you very much need those prefices exactly as they were present in the original document.
But it gets worth with XSLT using the prefixes in XPath expressions in attributes. If the prefixes are changed those values also need to be updated to change the prefix too, which requires complete knowledge of the format. This is because one cannot programmatically detect something like attributes that use custom data types that reference the prefixes in scope, but XSLT's xpath expressions show that W3C considers it legal to create such custom formats.
Though they mention something called xml directive. I don't think such a thing exists.
I would again emphasize that encoding/xml, to my knowledge, only has problems with this particular roundtripping use case. It can consume non-namespaced XML correctly, and handle namespaced XML as long as you don't plan on re-emitting XML.
What would probably end up happening is a new package appearing on github.com for this use case, forked off of encoding/xml, for this use case. (If you're looking for a project that might attain some use, this is a likely candidate.) Unlike something like Python where the core packages are often C-based and thus you can expect better performance from the built-in "set" than somebody's pure-Python "set" implementation from before the built-in, encoding/xml is just a pile of pure Go code whose only advantage is that it ships with the compiler. Anyone can replace it without incurring any other disadvantage whenever they like.
(I looked a few versions ago, FWIW; encoding/xml has deviated so much from what I forked that my fork is essentially dead and no longer releasable without basically starting over from scratch. Plus I built it with the idea that it should be a minimal modification (so I could port it forward, which turned out to not work, but it's still how it was built)... if I was truly forking I'd have done some more extensive changes to it to support namespaces in general, rather than for my particular case.)
Anyhow, upshot, the Go project as a whole is not stuck... it is specifically encoding/xml as the standard, built-in library that is stuck. It's not like Go is completely incapable of handling XML correctly from first principles for some reason or anything.
While they do have the problems described, XML namespaces are what allow for abstraction and composition of documents from disparate systems.
I'm biased in this regard, but I view SPIFFE's inclusion of JWT Tokens as an authentication method as fundamentally flawed - By allowing bearer tokens, you are no longer verifying identity, but passing identity around. JWT has also been susceptible in the past[2] to the same kinds of attacks here - Poorly defined verification semantics.
I suspect that buried in the semantics around SPIFFE's SPIRE Server and Agent are a number of vulnerabilities or other ways that trust doesn't mean quite what you think it means. I'd love for someone with interest to take a look. Besides the obvious downsides fundamental to Isitio's MITM Proxy architecture, I think there's more lurking on that edge.
[1] https://spiffe.io/ [2] https://auth0.com/blog/critical-vulnerabilities-in-json-web-...
Unsurprised it can cause security issues, especially in XML-DSig which is a nightmare to handle correctly.
Mapping between XML elements and data structures is inherently flawed ... See package json for a textual representation more suitable to data structures.
I'm amazed people can get it as right as they do half the time? I do think Go will get fixed eventually. It's just too weird if they couldn't fix the core issue? But I've never used XML if I can help it, so I'm absolutely no expert on what would make it impossible to fix something like this.