Kaminsky described a very simple and nearly-universal technique to deal with escaping/injection issues. Encode the embedded data as base64 and decode it on the client side. This projects arbitrary data into a fixed, known domain (generally `[a-zA-Z0-9+/]*`) which you can ensure is free from control characters. (You may need to use a particular variant to achieve this, eg for URLs the last characters used are generally `-_` because both + and / are significant in that context.)
After decoding, you can pass it to JSON.parse().
And yeah use URL-safe base64 when you do use it. -_ with no padding.
The advantage of the base64 technique is that it provides fewer degrees of freedom, and so is more robust to unforseen vectors of attack. It's defensive programming. But it comes at a cost of memory/bandwidth.
(This is related to the 'prototype pollution' attack, although searching that phrase will mostly give you information about the more-dangerous variant where two objects are being merged together with some JS library. If __proto__ is just part of a literal, the behavior is not as dangerous, but still surprising.)
type
This attribute indicates the type of script represented. The value of this attribute will be one of the following:
[...]
Any other value
The embedded content is treated as a data block, and won't be processed by the browser. Developers must use a valid MIME type that is not a JavaScript MIME type to denote data blocks. All of the other attributes will be ignored, including the src attribute.
Although 'importmap' has specific functionality, as does 'speculationrules', although they operate similarly. My favorite is type="module" which competes with the higher level attribute nomodule="true". Anyways it looks like <script> has taken a lot of abuse over the years:https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...
- escape `<` as `\u003c`
<script id="my-json" type="application/json">{{ escaped_json }}</script>
JSON.parse(document.getElementById('my-json').textContent)
No __proto__ issue, and no dynamic code at all, so you can use a strict CSP. <script language="JavaScript"><!--
// script contents
-->
</script>I guess people just generally don’t add those?
Still, to help me out, could someone clarify why this was down-voted? I don’t want to mess up again if I did, but I don't understand what that was.
Most people will opt for text to be optional with a link - unless they're showing their own product (Show HN). Because there is an expectation that you will attempt to read an article, before conversing about it.
I think most of the time people dont add a comment to submissions, but if they do its more of the form: I found X interesting because of [insert non obvious reason why X is interesting] or some additional non-obvious context needed.
In any case, i don't think there is any reason to worry too much. There was no ill intent and at the end of the day its all just fake internet points.
CDATA: https://en.wikipedia.org/wiki/CDATA
<![CDATA[
]]>
This would work for XHTML but not HTML5 IIUC: <script>
<![CDATA[
x = {"<!--":""};
]]>
<![CDATA[
{{json.dumps(["<!--"])}}
]]>
</script>It’s helpful to recognize that the inner script tags are not actual script tags. Yes, once entering a script element, the browser switches parsers and wants to skip everything until a closing script tag appears. The STYLE element, TITLE, TEXTAREA, and a few others do this. Once they chop up the HTML like this they send the contents to the separate inner parser (in this case, the JS engine). SCRIPT is unique due to the legacy behavior^1.
HTML5 specifies these “inner” tags as transitions into escape modes. The entire goal is to allow JavaScript to contain the string “</script>” without it leaking to the outer parser. The early pattern of hiding inside an HTML comment is what determined the escaping mechanism rather than making some special syntax (which today does exist as noted in the post).
The opening script tag inside the comment is actually what triggers the escaping mode, and so it’s less an HTML tag and more some kind of pseudo JS syntax. The inner closing tag is therefore the escaped string value and simultaneously closes the escaped mode.
Consider the use of double quotes inside a string. We have to close the outer quote, but if the inner quote is escaped like `\”` then we don’t have to close it — it’s merely data and not syntax.
There is only one level of nesting, and eight opening tags would still be “closed” by the single closing tag.
^1: (edit) This is one reason HTML and XML (XHTML) are incompatible. The content of SCRIPT and STYLE elements are essentially just bytes. In XML they must be well-formed markup. XML parsers cannot parse HTML.
Everything until the tag closer </script> is inside
the script element.
And: In fact, script tags can contain any language (not
necessarily JavaScript) or even arbitrary data. In order to
support this behavior, script tags have special parsing
rules. For the most part, the browser accepts whatever is
inside the script tag until it finds the script close tag
</script>.
Note the sentence fragment "even arbitrary data." This explains the second part of your question as to why nested script tags without HTML comments do not require matching closing tags. Similar compatibility hacks exist for other closing tags (search for Chrome closing tags being optional for a fun ride down a rabbit hole).As to:
why a script tag inside a comment inside a script tag needs
to be closed ...
Well, this again is due to maximizing backward compatibility in order to support broken browsers (thanks IE4, you bastard!). As the article states: When JavaScript was first introduced, many browsers did not
support it. So they would render the content of the script
tag – the JavaScript code itself. The normal way to get
around that was to put the script into a comment ...
HTHOr did they always have two levels of script tag escaping but that behavior only got preserved when inside an HTML comment?
No other JavaScript behavior is different inside an HTML comment, and I’m still missing the connection between the HTML comment and the embedded </script> not closing the tag besides that they were two things that older browsers might have done.
There are two situations in which it does.
① XML syntax, which is absolutely still a thing:
data:application/xhtml+xml,<html xmlns="http://www.w3.org/1999/xhtml"><script>console.log( 1 > 0 && 0 < 1 )</script></html>
② Inside an SVG <script> element in HTML syntax: data:text/html,<svg><script>console.log( 1 > 0 && 0 < 1 )</script></svg>That ship sailed several paragraphs ago, when <script> got special treatment by the HTML parser. Too bad we couldn't all agree to parse <![CDATA[...]]> consistently, or, you know, just &-escape the text like we do /everywhere else/ in HTML.
<script>console.log("<![CDATA[Hello, this string content in a CDATA section!]]>");</script>
Results in this being output to the console: <![CDATA[Hello, this string content in a CDATA section!]]>
Browsers don't do what you intend if you wrap the whole script in CDATA, either. They treat the "<![CDATA[" sequence as literally part of the script! Which of course throws a syntax error.I tend to use them anyway, as sort of a HTML/XHTML polyglot thing, because deep in my heart I still think HTML should be valid XML:
<script>/* <![CDATA[ */
// my script here, and you *still* need to be careful not
// to include close-script or close-cdata sequences
/* ]]> */</script>
In summary, the 'special parsing rules for script tags' add a great amount of complexity not just to the parsing code, but for anybody who has to emit markup, especially if different parsers disagree on what kind of escaping rules are active within a given section. Yes, the HTML5 spec codified the neurotypical "I would rather make you guess what I mean than just use the proper words to say it clearly" behavior, so at least browsers agree on it, but it's a mess and a pain to deal with because now you have to remember 1000 exceptions to what would have been simple rules.