RSA 2013 speaking session

Posted by Romain, Comments

Update March 1st: Since the PDF on RSA website seems broken, I have attached a version to this blog post.

I'll be at RSA this week. My session is Friday morning (10:20am, Room 132) and is called:

Why haven't we stamped out XSS and SQL yet?

RSA talk content

Since all the slides are apparently available for everyone on the RSA website, I can give some more insight about what I will be talking about. We ran an experiment at Coverity in which we analyzed many Java web applications and looked for where developers add dynamic data. The goal was to try to understand what contexts (both HTML contexts and SQL contexts) are frequently used.

The tone of the talk is fairly straightforward: security pros. have been giving advice to developers for a long time, yet we still have these issues on a frequent basis, so we map common advice with what we see from the data.

What you can expect from this talk:

Some information about observed HTML contexts: that's about 26 different contexts stacks, 45% of them had 2 elements in the stack (e.g., HTML attribute -> CSS code), and the longest ones had 3 elements. A list of SQL contexts and good notes about what developers usually do. Advice for security pros. on how to communicate with developers (things that led to the creation of our Fixing XSS: A practical guide for developers document).

Anyhow, this blog post is not only to announce this talk, but also to give some insight on how we extracted the data from these applications.

Analysis technique

We created and modified different checkers from Coverity Security Advisor in order to extract all injection sites that are related to dynamic data regardless of its taintedness. For each injection site, we computed the context in which it belonged to the sub language (one of HTML, JavaScript, CSS, SQL, HQL, and JQPL). This represents our working dataset.

Here's an example of an injection site (using JSP and EL):

<script type="text/javascript">  
var content = '${dynamic_data}';  
// context ::= {HTML SCRIPT TAG -> JS STRING}  
</script>  

We tracked the construction of this snippet of the HTML page and recorded the injection site such as ${dynamic_data} and its associated context (JS STRING inside HTML SCRIPT TAG). Since we do not care about the taintedness of dynamic_data, we didn't need to track all paths that could lead to a defect (XSS here), and that's where what we did is very different from our XSS checker. Note that we still needed to properly track parts of the HTML page that's being constructed to properly compute the context. This is however part of our context-aware global data flow analysis...

For SQL related queries, we essentially needed to do the same thing, but we also needed to track the parameters being inserted in a query using a parameterized notation: remember, we needed to find all dynamic data that could eventually go into a query. That's why the following code:

String sql = "select foo, bar from table where 1=1";  
if (cond1)  
  sql += " and user='" + user_name + "'"; // context ::= {SQL_STRING}  
if (cond2)  
  sql += " and password=?"; // context ::= {SQL_DATA_VALUE}  

has 2 interesting injection sites for the experiment, and we didn't need to understand the full abstract string (an eventual set of 4 possible strings) from this piece of code.

Note that if there is this fairly common construct:

String sql1 = "select foo, bar from table where ";  
String and_beg = " and (";  
String and_end = " ) ";  
sql1 += and_beg + "user = '" + user_name + "'" + and_end;  
sql1 += sql2; // `sql2` is another part of the query coming  
              // from a different procedure or so  

we will still properly track the contexts even if all parts (sql1, and_beg, etc.) are inter-procedurally created.

Limitations

I will quickly explain this during the talk, but essentially tracking HTML contexts on a global data flow analysis is not a trivial thing. Moreover, considering the impact of some JavaScript code on the resulting web page (and therefore where the HTML contexts could potentially be transformed at runtime) is an even more complex problem. We did not analyze JavaScript.

Comments!