undefined | Better HN

0 pointspointlessone3y ago0 comments

I guess, it depends on how you define XML baseline. You can have a very simple XML with only bare tags. It will work just fine. Arguably, it's even simpler than JSON that way. A basic parser for that it probably not more complex than a JSON parser.

All the optional complexity that can go on top, though, is probably better specified for XML. Transformation is well defined for XML (XSLT) but not at all for JSON (I guess, you write your own code to manipulate native objects).

Schemas are basically a native feature for XML. Not so much for JSON.

All sorts of specialised vocabularies are defined for XML. A few are defined for JSON, too.

0 comments

jsmith453y ago

For a lot of XML you need to be able to support XML namespacing, and doing that adds a lot of complexity over the original pure XML.

At first XML namespacing sounds simple. Each tag and attribute will have an optional uri attached to it, no big deal right?

From reading through the specification one could be forgiven from assuming that the prefixes are just arbitrary mappings that a processor can ignore, or automatically remap to alternate prefixes.

For example, it is true that <abc:a xmlns:abc="https://example.com/xyz" xmlns:def="https://example.com/xyz"><def:b>5</def:b></abc:a> (notice both namespaces are the same url) is equivalent to: <a xmlns="https://example.com/xyz"><b>5</b></a>.

Unfortunately, the data model also allows for content to reference the namespaces by prefix, and therefore every general xml processor that supports namespaces must keep around an application accessible mapping from the prefixes to namespaces, as the application may need to be able to access that information to interpret attributes or content. The only exception to this would be if the general XML processor insisted on having schema information for every namespace it might come across. In that scenario it would be able to tell if an attribute value of "abc:b" is really a string literal, or a reference to a namespace identifier (QNAME data type), where the namespace is whatever the current "abc" prefix is bound to, and the identifier is "b".

But obviously we don't want to add full schema support for a simple implementation, so we need to keep the mapping information around, just in case the application needs it. We also cannot easily offer nice features like changing a document to use preferred prefixes for certain namespaces, unless we also keep any prefixes that are used in values that could be interpreted as QNAMES, just in case they actually are, but our processor does not know, because it has omitted schema support for simplicity (or perhaps it included schema support, but does not have a schema available for some namespace).

And that is just the complexity that stems from one fairly small quirk in how XML works.

You also have no idea if an element content needs to preserve whitespace or not if you don't know the schema, and don't happen to have an xml:whitespace attribute present. Thus if you want to re-indent arbitrary xml for readability safely you could end up with something like this:

    <abc
        ><def
            >5</def
        ></abc
    >

pointlessoneOP3y ago

I understand what you're getting at but that is you choosing higher complexity baseline. Yes, it's a part of a standard but you can chose not to support it. No one said you have to support all of XML-verse in order to use it effectively in your particular application. The most common cases are usable without any of it. Look at most RSS/Atom feeds, XHTML, SVG. They all can get by with simple tags and and attributes.

I'm just not buying the argument that XML's complexity is somehow remediated in JSON. JSON becomes as horrible as XML when you bring it up to feature parity. And that's when there's a way to match features. Whatever people say about XSLT, it is powerful, reasonably well defined, and generic over all documents (even though complex). There's nothing like it for JSON I know of.

j / k navigate · click thread line to collapse