If this was the case, it would be near-impossible to write HTML by hand. And if you're writing HTML with a tool (React, HAML etc.), the tool could be doing HTML escaping correctly instead. This isn't an issue with HTML, it's an issue with human error.
All security issues are due to human error. Those are solved by building better tools.
> If this was the case, it would be near-impossible to write HTML by hand.
If, besides the text form, there would be a well-defined length-prefixed binary representation, we could simply compile HTML to binary-HTML, which would immediately made the web not only safer, but also much more efficient (it's scary if you think just how much parsing and reparsing goes on when displaying a web page).
My point is that there's nothing wrong with HTML. HTML isn't a tool, it's a format for storing and transmitting hypertext. If you're using React or HAML or any of the other HTML-generating tools, you're effectively immune from XSS. I'm putting forth that developers aren't using effective tools (shame on every templating engine that doesn't escape by default), and that calling the web as a platform bad is a bit nonsensical. It's like saying "folks are writing asm by hand and their code has security issues, therefore x86_64 is insecure".
However, no such tool exists. I think there's a deeper issue here: the sheer number of ways you can generate XSS alone, even ignoring the other exploit types, is far beyond what any tool is capable of stopping. Look at one of the XSS holes found by Homakov that I linked to from my article:
http://sakurity.com/blog/2015/06/25/puzzle2.html
The XSS occurs on this line of JavaScript, not HTML:
$.get(location.pathname+'?something')
That's a simple line of JQuery that does an XmlHttpRequest to the same page that was loaded with an additional parameter. By itself, it is not an XSS. But if the backend is/was running Ruby on Rails (presumably some old version by now) then it could turn into an XSS due to a combination of features that all look superficially harmless.Show me the tool that would have avoided that type of exploit, without already knowing about it and having some incredibly specific hardcoded static analysis rule.
When I argue that the web is unsafe by design, it's because cases like that aren't rare, they're common. To paraphrase Veekun, scratch the surface of web security and you'll find yourself in a bottomless downward spiral, uncovering more and more horrifying trivia.
I think you're missing another two obvious explanations:
1. Lack of education when picking a tool (copy paste from bad SO answers is a frequent source of bad code).
2. Developers don't care. If it works, why bother wrapping your head the rest of the way around to understand why it works or whether it's secure?
> By itself, it is not an XSS. But if the backend is/was running Ruby on Rails (presumably some old version by now) then it could turn into an XSS due to a combination of features that all look superficially harmless.
Sure, ERB before RoR essentially had security turned off by default (as I noted). And this issue could happen with any other non-web system, turning into any other kind of vulnerability. This isn't a web problem, it's a system security problem. Bad inputs in a "native" app could lead to security issues in the output of apps on other devices. Badly implemented binary data decoders in a desktop application could do far worse than a XSS in the browser.
This problem is misattributed as a "web problem" because there are far more complete systems on the web than there are on nearly any other platform. It's like the tired argument that Mac is more secure than Windows, but Windows has historically had an overwhelmingly outsized market share, making OS X issues far less valuable to attackers.
> When I argue that the web is unsafe by design, it's because cases like that aren't rare, they're common.
I don't disagree that these issues are common, but I disagree that the web is unsafe by design. The web is a platform. If everyone wrote their Python APIs without a framework, I can guarantee they would be littered with security holes. If everyone wrote their own text renderer in C++, just displaying strings on the screen would be a dangerous task.
There are good tools that make it really hard to fuck up on the web. Seriously, try to accidentally have a XSS vulnerability in an isorendered React app with Apollo. The problem is folks that want to jQuery-jockey their way across the finish line and don't understand that they are making terrible mistakes.