So <html>abc</html> would go as
<html><datum 1></html> where datum 1 would refer to the first datum in the data stream, being 'abc' and no matter what trickery you'd pull to try to put another tag or executable bit or other such nonsense in the datum it would never be interpreted. This blocks any and all attacks based on being able to trick the server or eventual recipient browser of the two streams to do something active with the datum, it can only be passive data by definition.
For comparison take DTMF, which is inband signalling and so easily spoofed (and with the 'bluebox' additional tones may be generated that unlock interesting capabilities in systems on the line) and compare with GSM which does all its signaling out-of-band, and so is much harder to spoof.
The web is basically like DTMF, if you can enter data into a form and that data is spit back out again in some web page to be rendered by the browser later on you have a vector to inject something malicious and it will take a very well thought out sanitation process to get rid of all the possibilities in which you might do that.
If the web were more like GSM you could sit there and inject data in to the data channel until the cows came home but it would never ever lead to a security issue.
No amount of extra encoding and checks will ever close these holes completely as long as the data stays 'in band' with the control information.