HTML 5 tokenizer introduces a handful of new HTML states that only exist within the Script data state (e.g. in a
<script> tag). Some of these are really interesting.
- Script data escaped dash dash state
- Script data escaped state
- Script data double escaped state
Here's a mocked up
<script> block with the states:
Here's ABNF formatting of the above in case state machines aren't your thing.
Boring you say.
Well, what does that mean? Given the above, the state transitions back to Script data escaped state when the original closing script tag (i.e.
</script>) is consumed. Let's say there isn't a
--> for the remainder of the document. Then the tokenizer ought to get to the end of file (EOF). If so, this is a parse error. The state ought to be switched to data state, and then the document is rendered, albeit not much.
Again, following the tokenizer spec, the tokenizer ought to be at the Script data escaped state when it encounters the less-than sign
<. It then transitions to the Script data escaped less-than sign state. But it doesn't care about the exclamation point
! here and switches back to Script data escaped. I'm hand waving the next couple state transitions until it gets to that
-->. At that point, it'll go through the following:
- Script data escaped dash state.
- Script data escaped dash dash state.
- And then Script data state when the
OK... what does that mean?
What are the risks?
Well, unfortunately it depends.
If you're not using HTML 5 (i.e. <!DOCTYPE html> isn't at the top of your template), then you're OK. If you are using HTML 5 and your escaper doesn't escape those characters, I politely recommend giving CSL a spin. If you can't change escapers, then it depends upon where the injection occurs and here's where there be dragons.
If the above is your code, then the worse that can occur is the remainder of the document is not rendered by the browser. There's no direct risk for XSS. If that's acceptable, then huzzah. Although you still have a defect
Where else could this be an issue? Possibly in frame busting code that injects tainted data. Possibly in JSON directly inserted into a script context. ???
It's probably not worth the cycles to see if the defect is exploitable when there are reasonable remedies out there. If CSL isn't your cup of tea, bother your current vendor on patching their escaper. But if you haven't tried out CSL, I recommend it! And if you're using Maven, it's just one dependency away.