undefined | Better HN

0 pointsdeveloper28y ago0 comments

The point I was making is that nobody uses XHTML thanks to the browser vendors' refusal to accommodate it early on when the demand was rampant. By the time the comparison was html5 vs. XHTML2 instead of HTML 4 vs XHTML1, it was too late as we had been trained to ignore the XHTML variant due to the vendors' absolute refusal to even make XHTML1 work. If you know of a single major site (not somebody's little side project) that uses the XHTML mime type, please share so I can be amazed.

The fact that the html5 spec does not permit self-closing CDATA elements is precisely the kind of legacy trash we'll be dealing with for yet another 10-30 years. (I understand that html5 didn't change the parsing rules from HTML 4 in order to be backwards-compatible, but it's still infuriating).

0 comments

deathanatos8y ago

> The point I was making is that nobody uses XHTML

I don't disagree here.

> The fact that the html5 spec does not permit self-closing CDATA elements

The HTML spec does permit self-closing <script>: in the XHTML syntax.

The HTML5 specification defines two "concrete syntaxes" for HTML: HTML, and XHTML. The latter supports self-closing <script> tags perfectly fine.

The former (the HTML syntax), only allows self-closing tags in two contexts: void tags (of which <script> is not), and foreign tags (e.g., SVG, and XML-like stuff). Now, perhaps you can argue that they should just have allowed it on all elements, such as <script>; frankly, I feel like the reason the standard permits it on void elements at all is just to handle the legions of webdevs out there who think they're writing XHTML but only ever use the syntax for <br/> and are incorrectly serving the resulting soup with text/html.

But, if you're writing the HTML syntax, just write the HTML syntax. Some elements require the end tag, some don't. Typically, it is simple enough to tell, simply by asking "could this element have content?" (if yes: end tag, else: no end tag) If you want more consistent parsing rules, that's what the XHTML syntax is for. (Though I agree, it doesn't seem to see much real-world use.)

(Frankly, I greatly prefer the gentle fallback of the HTML syntax compared to the hard error of the XHTML syntax, which is considerably user unfriendly.)

bzbarsky8y ago

> By the time the comparison was html5 vs. XHTML2 instead of HTML 4 vs XHTML1

The relevant comparison is html5 in its HTML serialization vs html5 in its XML serialization. The latter works in every single browser, and has since IE9 shipped in 2011. No one uses it.

> If you know of a single major site (not somebody's little side project) that uses the XHTML mime type

There aren't any, because I suspect people building such sites all discovered the same thing: ensuring well-formedness is _hard_ in practice, and if it's required for the page to be shown at all, then your page will fail to be shown every so often. And no one wants to deal with that.

Back when some people were in fact trying to use XHTML on the web, every so often you'd run into this on some site that sent XHTML based on "Accept" headers. You'd load the site in Mozilla (suite, then Firefox when it came into being) and get an XML parsing error.

There were two common sources of this problem. First, someone editing a template and forgetting to modify closing tags to match opening ones. This can be solved with server-side enforcement of template well-formedness, of course. But it means you can't have your start and end tags in different parts of the template or different templates, which people wanted to do.

Second, insertion of content you don't control, whether it's user-contributed, or coming from some other team (e.g. content-production team on a news site feeding their bits into the CMS templates), or coming via a content provider like the AP or whatnot. You can mitigate this by using a fully DOM-based workflow, serializing before you put on the wire, instead of pasting together strings. But now you have the problem of producing a DOM from whatever non-well-formed garbage you were handed. Yes, you can just reject non-well-formed input, but if you have no leverage over the producer of that input, that just means you can't do your job. OK, so maybe you have a more liberal parser on the input end and then ensure everything internally operates on trees, not text.

But the upshot in the end is that you end up with a lot more effort and the benefits are not entirely obvious (at least not entirely obvious to your management; there are certainly obvious anti-XSS benefits to having good control of what tokens end up in your output and where escaping happens, etc). So the path of least resistance is to just not go there in terms of the XHTML serialization of HTML.

> The fact that the html5 spec does not permit self-closing CDATA elements

I'm not sure why "CDATA element" is important here. You'd want self-closing <style> and <script> but not self-closing anything else? The idea doesn't even make sense for <style>, so presumably you just want self-closing <script>?

j / k navigate · click thread line to collapse