The Way You Do The Things You Do

Posted by Jon, Comments

While researching and developing the remediation advice for the cross-site scripting checker for Security Advisor, we noticed some idiosyncrasies with some popular escapers for HTML, XML, and JavaScript contexts. Romain and Andy have gone into the Coverity Security Library (CSL), which encapsulates some of this research.

JavaScript escapers, like Spring's JavaScriptUtils.javaScriptEscape() escape the forward slash (/) character. Spring mentions a reference to the Mozilla Core JavaScript guide, which has an interesting section on String literals. However, that section makes no mention of the forward slash. JavaScript is a dialect of the ECMAScript, defined currently in ECMA 262. Looking at the current (as of this blog post) revision of 5.1, ECMAScript defines String Literals in section 7.8.4. The Grammar Section in Annex A at the end of the document also is a good spot to read up on the syntax rules. To summarize, a String Literal can be anything between a single-quote (') or double-quote (") character except that quote character and Line Terminators. Drilling into the spec more, there's no obvious reason why the forward slash is escaped.

Now you could be saying to yourself, it's to escape someone from inserting a comment. Since the escaper escapes the quote characters, and a String Literal can contain a forward slash, there's no obvious way this would end the quoted context and start a new single-line comment context. So that's probably not it. And looking into the escapers, they're not trying to do anything with regular expressions, so that doesn't seem to be it.

Spring isn't alone here. The Apache Commons StringEscapeUtils.escapeEcmaScript() escaper also escapes the forward slash. Huh. So something is going on here. Now I don't know exactly why they do the things they do, but I can speculate .

A reasonable explanation starts with assuming these escapers are used in JSPs or templates. (Spring's escaper is used via their JSP tag library, e.g. spring:message.) If the developer inserts attacker-controlled tainted data into a JavaScript string context within this template, one could assume it's probably occurring between <script> tags. While it could occur within an HTML attribute context, let's go with the <script> tags. HTML 5 ought not change the state from "script data state" until a less-than sign (<) occurs. Then if the next character is a forward slash, the state ought to change to the "script data end tag open state". And here's the ahah moment. Since these escapers JavaScript backslash escape the forward slash character and the injected data contained </script>, the new escaped text should be <\/script>. An HTML 5 tokenizer like your browser ought to change back to the script data state. This prevents an attacker from injecting a </script> tag, closing the JavaScript script context, and starting whatever next context the attacker chooses.

Let's say the JSP wasn't using an escaper. Here's a snippet from somefile.jsp:

<script type="text/javascript">  
  var foo = "${}";  

Then if the following vector was passed to the application: ?foo=</script><script>var foo=1;//

The JSP ought to return the following HTML:

<script type="text/javascript">
  var foo = "</script><script>var foo=1;//";

The browser's JavaScript parser ought to register a syntax error within the first block and ignore it. The second block's foo ought to be respected, resulting in a value of 1. User-controlled data has changed the intent of the JavaScript. And this could result in an XSS attack.

Now, let's say it was. Here's somefile.jsp updated:

<%@ taglib prefix="spring" uri=""%>  
    <script type='text/javascript'>  
        var foo = "<spring:message javaScriptEscape="true" text="${}" />";  

Using the same vector, the JSP ought to return the following HTML:

<script type="text/javascript">
  var foo = "<\/script><script>var foo=1;\/\/";

This ought to be parsed as a normal double-quoted string by the browser. No XSS here.

In case you're wondering, the CSL conservatively escapes characters that could cause context transitions like these. If any are not obvious, please let us know on the GitHub page.