> The 90s were slowly reinventing UNIX and stuff invented at Bell Labs.
Yes, this reminds me of: "Wasn't all this done years ago at Xerox PARC? (No one remembers what was really done at PARC, but everyone else will assume you remember something they don't.)" [1]
> "Buffers that don’t specify their length"
> Is this really a common problem in web apps? Most web apps are built in languages that don't have buffer overrun problems. There are many classes of security bug to be found in web apps, some unique to web apps...I just don't think this is one of them. This was a common problem in those C/C++ programs from the 90s the author is seemingly pretty fond of. Not so much web apps built in PHP/JavaScript/Python/Ruby/Perl/whatever.
Most injection attacks are due to this; if html used length-prefixed tags rather than open/close tags most injection attacks would go away immediately.
That's not really the problem. The problem is there is no distinction between data and control leading to everything coming to you in one binary stream. If the control aspect would be out-of-band then the problem would really go away.
Length prefixes will just turn into one more thing to overwrite or intercept and change. That's much harder to do when you can't get at the control channel but just at the data channel. Many old school protocols worked like this.
This is the important takeaway here. Changing the encoding simply swaps out one set of vulnerabilities and attacks for another. Separating control flow and data is the actual silver bullet for this category of attacks.
Unfortunately, there’s rarely ever a totally clear logical separation between the two. Anything you want to bucket into “control”, someone else is going to want the client to be able to manipulate as data.
Granted, if you made that control channel stateful, you'd make a lot of problems go away. But you could do that with a combined control/data stream too.
What am I missing? How would an out-of-band control channel make things easier?
That said, I think many issues with the web could be solved by implementing new protocols as opposed to shoehorning everything into HTTP just to avoid a firewall...
So <html>abc</html> would go as
<html><datum 1></html> where datum 1 would refer to the first datum in the data stream, being 'abc' and no matter what trickery you'd pull to try to put another tag or executable bit or other such nonsense in the datum it would never be interpreted. This blocks any and all attacks based on being able to trick the server or eventual recipient browser of the two streams to do something active with the datum, it can only be passive data by definition.
For comparison take DTMF, which is inband signalling and so easily spoofed (and with the 'bluebox' additional tones may be generated that unlock interesting capabilities in systems on the line) and compare with GSM which does all its signaling out-of-band, and so is much harder to spoof.
The web is basically like DTMF, if you can enter data into a form and that data is spit back out again in some web page to be rendered by the browser later on you have a vector to inject something malicious and it will take a very well thought out sanitation process to get rid of all the possibilities in which you might do that.
If the web were more like GSM you could sit there and inject data in to the data channel until the cows came home but it would never ever lead to a security issue.
No amount of extra encoding and checks will ever close these holes completely as long as the data stays 'in band' with the control information.
Or, run your data through stored procedures instead. It took me a while to figure out why stored procedures were so much more secure than regular queries. I finally figured out it was because a stored procedure does exactly what the grandparent post says: It treats all inputs as data with no possibility to run as code.
If this was the case, it would be near-impossible to write HTML by hand. And if you're writing HTML with a tool (React, HAML etc.), the tool could be doing HTML escaping correctly instead. This isn't an issue with HTML, it's an issue with human error.
All security issues are due to human error. Those are solved by building better tools.
> If this was the case, it would be near-impossible to write HTML by hand.
If, besides the text form, there would be a well-defined length-prefixed binary representation, we could simply compile HTML to binary-HTML, which would immediately made the web not only safer, but also much more efficient (it's scary if you think just how much parsing and reparsing goes on when displaying a web page).
My point is that there's nothing wrong with HTML. HTML isn't a tool, it's a format for storing and transmitting hypertext. If you're using React or HAML or any of the other HTML-generating tools, you're effectively immune from XSS. I'm putting forth that developers aren't using effective tools (shame on every templating engine that doesn't escape by default), and that calling the web as a platform bad is a bit nonsensical. It's like saying "folks are writing asm by hand and their code has security issues, therefore x86_64 is insecure".
How so? If you allow the user to send arbitrary data, and your handling of that data is where the problem lies, it isn't going to matter whether the client sends a length-prefixed piece of data. You still have to sanitize that data.
HTML, and whether it uses closing tags or not, is pretty much irrelevant to the way injection attacks work, as far as I can tell. Maybe I'm missing something...do you have an example or a reference to how this could solve injection attacks?
It would be interesting to see if this idea could work in practice.
I feel like this is conflating two different problems and potential solutions.
I'm not saying injection attacks aren't real. I'm saying that whether HTML uses closing tags or not is orthogonal to the solution. But, again, maybe I'm missing something obvious here. I just don't see how what you're suggesting can be done without types and I don't see how types require prefixing data size in order to work.
No it wouldn't. It wouldn't fix sql injection and it also wouldn't fix the path bug the op linked.
The problem is not length, it is context unaware strings. The problem is our obsession with primitive types that pervade our codebases.
Injection in general is simply a trust problem. If you can trust all inputs fully (hint: you can't, because nobody can), then you will never have an injection attack.
Obviously nobody is going to be typing length prefixes manually, so our tools are going to do it for us.
Now we're back where we started where you accidentally inline user content as HTML, except now HTML has the added cruft of someone's HN comment solution.
Oh thank God. I'm going to forward this to my wife.
Or were senders always going to send true values for length and data?
Really, you can't trust any sender, so the data should be validated anyway.
There's been known attacks where a sender says here's 400 bytes and the receiver stupidly trusted that length specifier, and the sender's sends more (or less) crafted bytes and BOOM!
Known good data start and end specifiers, which HTML has, seems a good answer when dealing with untrusted senders (read:everyone)