Did I Do That? HTML 5 + JS Escapers != <3

Posted by Jon, Comments

When writing the last blog post, I swear the HTML 5 tokenizer spec had a Steve Urkel moment with me. Well, not with me, but with a lot of existing JavaScript escapers out there. Why is that?

HTML 5 tokenizer introduces a handful of new HTML states that only exist within the Script data state (e.g. in a <script> tag). Some of these are really interesting.

  • Script data escaped dash dash state
  • Script data escaped state
  • Script data double escaped state

Here's a mocked up <script> block with the states:

<script type="text/javascript">
   <!-- That second dash to the left is the Script data escaped dash dash state
       This is script data escaped state
       <script> This is script data double escaped state
       This is script data escaped state

Here's ABNF formatting of the above in case state machines aren't your thing.

Boring you say.

Not at all I say What's interesting about these states is that the current JavaScript is also in play. "What?" I hear from you.

<script type="text/javascript>
var foo = "<!-- <script> guess what state I'm in!";
... rest of document

Normally, the JavaScript state would be in double quoted string, within the Script data state. (Check out the previous blog for a rundown of ECMAScript and String literals.) Now, the JavaScript state is in a double quoted string, within the Script data double escaped state. And the wheels start turning

Well, what does that mean? Given the above, the state transitions back to Script data escaped state when the original closing script tag (i.e. </script>) is consumed. Let's say there isn't a --> for the remainder of the document. Then the tokenizer ought to get to the end of file (EOF). If so, this is a parse error. The state ought to be switched to data state, and then the document is rendered, albeit not much.

<script type="text/javascript>
var foo = "<!-- <script> guess what state I'm in!";
... some of the document
<!-- Remember to remove this code for the next version -->
... remainder of the document

Again, following the tokenizer spec, the tokenizer ought to be at the Script data escaped state when it encounters the less-than sign <. It then transitions to the Script data escaped less-than sign state. But it doesn't care about the exclamation point ! here and switches back to Script data escaped. I'm hand waving the next couple state transitions until it gets to that -->. At that point, it'll go through the following:

  • Script data escaped dash state.
  • Script data escaped dash dash state.
  • And then Script data state when the > is consumed.

OK... what does that mean?

Remember the previous JavaScript context the page was in? Double quoted string literal. We're now back to that point in the script nesting. So, our string is this document text, except string literals cannot contain new line characters. A syntax error ought to be thrown by the browser's JavaScript parser.

Now what did HTML 5 do? It made ineffective a lot of Java-based JavaScript escapers. Many escapers don't care about "<", "!", "-" or ">". Coverity Security Library (CSL) does care about "<" and ">" so it's not affected.

What are the risks?

Well, unfortunately it depends.

If you're not using HTML 5 (i.e. <!DOCTYPE html> isn't at the top of your template), then you're OK. If you are using HTML 5 and your escaper doesn't escape those characters, I politely recommend giving CSL a spin. If you can't change escapers, then it depends upon where the injection occurs and here's where there be dragons.


<script type="text/javascript>  
var foo = "${some:JsEscaper(param.foo)}";  
<%-- no other injection points --%>  
<%-- ... no more script blocks --%>  

If the above is your code, then the worse that can occur is the remainder of the document is not rendered by the browser. There's no direct risk for XSS. If that's acceptable, then huzzah. Although you still have a defect


<script type="text/javascript>  
// totally contrived  
var isBarSafe = true;  
<script type="text/javascript>  
var foo = "${some:JsEscaper(param.foo)}";  
var bar = "${some:JsEscaper(param.bar)}";  
isBarSafe = testBar(bar);  
<script type="text/javascript>  
if (isBarSafe) {  

Yes, the above is contrived. It's just illustrating where the wheels came come off the JavaScripting bus. An attacker could inject via the bar parameter. This ought to cause the aforementioned syntax error above, nullifying anything in that second script block. The fail open scenario leaves the site at risk for XSS.


<script type="text/javascript>  
<!-- <script> some example ${some:JsEscaper(param.example)} </script> -->  

The above is a nasty location to inject tainted data. Injecting --> ends the Script data escaped state, which leaves us at... Script data state. We're now in global JavaScript land with full XSS potential.

Where else could this be an issue? Possibly in frame busting code that injects tainted data. Possibly in JSON directly inserted into a script context. ???

It's probably not worth the cycles to see if the defect is exploitable when there are reasonable remedies out there. If CSL isn't your cup of tea, bother your current vendor on patching their escaper. But if you haven't tried out CSL, I recommend it! And if you're using Maven, it's just one dependency away.


@0x6D6172696F (.mario) had a conversation on Twitter back in August with @mathias regarding HTML 5 comments in JS. The WHATWG JavaScript section on Comments goes into better detail on what should or shouldn't be honor. This makes more sense than the trial and error testing I did above. These comments ought to be treated as single line comments. Therefore breaking them up across lines changes the meaning in JS. Thanks Mario!