We've released a new version of our Coverity Security Library on Github and Maven Central, and I'd like to talk a bit about the addition I made of a new class called Filter. I would also like to note that you should not use v1.1 due to an issue described later. Some of the implementation decisions I made are arguable, so I would really like to hear the community's thoughts on these. So if you're at RSA, feel free to swing by Booth #1759 at RSA on Tuesday to grill me or the rest of SRL about it.
The implementation of this Filter class is located in coverity-escapers/src/main/java/com/coverity/security/Filter.java
The Filter class contains a few methods which are not technically escapers, but serve a very similar purpose, namely the main methods we have added are:
They have been added since these are the most common contexts we have seen that do not have a defined escaper. To give a clearer picture of what I mean, have a look at this code snippet which shows some potential usage (via Java EL):
These three contexts present different problems, but essentially, there is obviously no way to turn an arbitrary string into a number of valid CSS color, and URLs face their own problems we'll get into a little bit later.
asNumber / asCssColor
These two functions validate the string passed is a valid number or CSS color. If the string does not pass that validation, then instead of throwing an exception we return a default value.
For asCssColor, our choice for a default is slightly hacky, since we chose to return the string invalid. I specifically chose this string, since it is an invalid CSS color, but one that we know is safe, and seems somewhat descriptive in the HTML output. The reason for choosing an invalid CSS color as the default string is that the CSS parser will simply ignore this single directive, and continue parsing the remaining CSS as if it had never encountered the invalid directive.
The main contender for an alternative was inherit, which is the default value when specifying the text color in CSS. However the default value for background-color directive is transparent instead of inherit, so choosing one or the other would lead to slightly (more) broken pages.
Since these values are potentially contentious, we have also provided a version of these functions where you can specify your own default value (check out the Filter and FilterEL classes).
<script> alert(0777); </script>
This makes some sense for integer constants that a programmer has written, since they may want the octal notation. However since this is unexpected behavior for most users and not consistent across HTML/CSS, I decided to strip leading 0s from octal numbers. I think this is not only correct, but an improvement over just doing validation.
If you have examples of where these decisions would break applications or make these filters unusable, I would love to heard about it, so we can make improvements for a later release.
asURL / asFlexibleURL
<a href=" jaVa s cript: alert(1)">encoding</a>
<script> document.location = "jaVascr ipt\ \x3aalert(1)"; </script>
Note that in both those cases the whitespaces are tabs, not spaces, otherwise those examples will not work.
Anyway, the point is that trying to create a blacklist that will catch all these variants, across different contexts, without false positives, is really tricky.
We can say that any URLs starting with the following strings are safe:
- / (a path that is anchored at the domain root or is scheme-relative, i.e. //google.com/)
- \ (a path to a UNC share)
The problem with specifying such a restrictive whitelist is that you need to ask yourself what to do with any rejected URLs.
One approach that we have seen in other projects, that we have taken some inspiration from, is assuming that if it is not a string with a clearly defined and well known protocol, you can rewrite it with the current web directory as a base path, e.g. on this page java/test.html
During some brain storming about how to gain access to the current directory to do this rewriting, I realized that we do not actually need to know what the current URL is, since we can let the browser work it out by simply using
./ as our prefix instead of the real URL. The browser will interpret this as a URL relative to the current page directory, which is exactly what the browser would have been doing before.
So, that is the crux of our implementation of asURL, if we can determine that this is a safe URL that is not relative to the current directory, we let it go through as is, otherwise we prepend
./ effectively forcing it to be.
However, while we think this will be ok for almost all web applications, we feel like this might be an issue for mobile, or other application scenarios where custom URL protocols are relatively common. When you actually want users to provide links to them, you can use the horribly named asFlexibleURL.
asFlexibleURL utilises the same approach of prepending ./ to make URLs safe, but it uses a blacklist on the scheme names to decide whether it should do so. Now, as we showed above, a blacklist is a very risky thing to try and construct for every possible context, so we have taken a fairly careful approach:
First, we have some special case handling for URLs that start with / or \ since they do not have a scheme, and should not be forced to be directory-relative
Next, we find the first non-scheme character (i.e. not
In all other cases, we prepend the URL with ./
The assumption this relies on is that you cannot do any encoding or parsing tricks with only the scheme characters, so if they are immediately followed by a colon, then we are parsing the scheme the same way a browser would. So far this seems quite safe to us, but we would appreciate feedback.
Is this even the right approach?
I'm pretty confident in this approach for asURL, however there is definitely an argument to be made that number and CSS validation should be done in the general business logic validation. However, I believe in trying to fix security bugs as close the 'sink' as possible, where the sink for XSS is the document creation, since this makes being certain that you do not have any vulnerabilities much easier, and adding these functions in addition to business logic validation would work juts as well.
There is also the question of whether default values are the right thing, another option is to raise an exception, however this seems like an ugly solution, since small amounts of malformed input would simply break pages in the application. There is an argument to be made that these exceptions give the developer a way to notice when malformed input is being supplied and not validated, but this could just as easily be achieved by logging validation failures.
Upgrade to version 1.1.1 not 1.1
We pushed out a version of CSL on Friday night that contained a vulnerability in
asFlexibleURL; if an attacker were to specify a blacklisted URL that wasn't completely in lower case, the function would let it through, e.g.
I have changed the implementation so that the scheme validation function only ever sees lower case URLs and does not have to worry about case issues in the future.
I have also modified
asURL to allow mixed case URLs such as
hTtP://www.coverity.com/ so that users hopefully encounter less unexpected behavior.