A Tale of Two Parsers

Posted by Alex, Comments

We have been doing research to ensure our current XSS checker can accurately find and give remediation guidance for XSS defects. For this research we have been examining parsers involved in rendering a HTML page in more detail so that our analysis knows what is and isn't safe in a given context. In addition, this knowledge has become the backbone of the Coverity Security Library that we have open sourced

I gave a lightning talk last month at Bluehat v12 to help shed some light on the quirks we've been finding, and received a significant amount of feedback indicating that most of this knowledge wasn't really well known beforehand, so we've decided to publish that information here so that the community can be better informed about some unexpected results.

Javascript String Escapers

Jon Passki already blogged about one of the issues here, and you should read that post if you haven't already, but I feel like he left out one particularly interesting case for potential XSS defects, if you don't have time to read that, try to imagine what would happen if a browser were to parse this HTML:

<script>  
var foo = "<!--<script>";  
</script>  
<a href="</script><script>alert('hax')</script>">link</a>  

If you save this as a html file and browse to it, you'll see that an alert saying hax fires, as per Jon's blog post. And if you have a look at your browser's developer tools (press F12 in Chrome or IE), you can have a look at what the DOM looks like and you will see that the block from the first script tag to the first closing script tag inside the href attribute is one single script block:

This is due to the "Script data double escaped state" that Jon mentions in his blog post, that we can force the browser into by opening a HTML comment and a new script tag.

So how is this relevant from a security perspective? If we look back at another one of Jon's blog posts here, we'll see that Spring, Apache, and most definitely other javascript string escapers avoid someone specifying the typical </script><script>alert(1)</script> payload by replacing / with \/ They do nothing to escape the characters < or >.

So, going back to the code example up the top, if you imagine the two parts highlighted in blue as attacker controlled, then we can see that these javascript string escapers will allow us to insert the first part of our attack <!--<script> and now we just need to insert the second part.

So, in our example above, injecting </script><script> into an attribute is actually "safe" by itself, since it does not break out of the href tag, however it's not very likely for this to exist since the most common way to secure an injection into a tag is to html encode it. My pick for this to be actually exploitable is to find a textarea tag that follows, since one of the common ways to fix this is to filter the closing </textarea> tag.

While we are currently working with Spring through SPR-9983 to address this issue, this is something you should check your own escapers for, since this is not an uncommon model for escaping JavaScript.

CSS String Escapers

In some cases developers want to let users control the contents of a CSS string, maybe to control a URL or a CSS selector. To get an example of what this would look like, consider this style tag:

<style>  
span[id="TAINTED_DATA_HERE"] {  
    background-color: #ff00ff;  
}  
</style>  

As part of our investigation of what a CSS string escaper needs to do, we took note of the most obvious examples of needing to escape ' and " as string delimeters and \ as the escaping character, but we also noticed a series of characters that needed to be escaped because they would otherwise cause parse errors.

In particular, according to the CSS spec, any of the following characters can potentially cause parse errors:

  • Line Feed (U+000A)
  • Form Feed (U+000C)
  • Carriage Return (U+000D)

In practice, this turned out to be true for Chrome, but false for Firefox and IE.

In any case, parse errors, particularly in CSS are not obviously security issues by themselves, however if you have a look at the CSS spec you will find that CSS is somewhat unique in that parse errors are not actually fatal and a compliant CSS parser will attempt to recover and re-sync with the CSS stream and begin parsing CSS again. This essentially allows us to escape the quoted string context and jump into the CSS data context and specify arbitrary CSS, like this chrome-only example which sets the background of the entire page to red:

<style>  
  span[id="  
{} body { background-color: red; }"] {  
  background-color: #ff00ff;  
  }  
</style>  

Having demonstrated that we can use these characters to escape the string context the obvious question becomes what can we do with this in Google Chrome? We haven't delved too deeply into this topic at Coverity, and chrome doesn't seem to have an obvious way to directly execute javascript, however there has been quite a bit of work done by other researchers on conducting JavaScript-less attacks that can still steal your data, this recent paper in particular is a has a good survey of what can be done with CSS.

One additional thing to note is that the parse error recovery behaviour has implications for CSS validators such as HTMLPurifier or AntiSamy since they need to be aware of this behaviour to ensure their parsing of CSS is the same as a browser's.

Comments!