A quick post on Apple Security 55471, aka goto fail

Posted by Jon, Comments

goto... fail?

If you haven't heard about the ironically named "goto fail" vulnerability, please read Adam Langley's well written article. A summary of the issue is as follows:

static OSStatus
SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,
                                 uint8_t *signature, UInt16 signatureLen)
{
...
    if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &clientRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
        goto fail;
        goto fail;
    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
        goto fail;
...
fail:
    SSLFreeBuffer(&signedHashes);
    SSLFreeBuffer(&hashCtx);
    return err;
}

See the two goto fail; statements. The second ensures subsequent statements within the same block will not be executed, making that code unreachable. The problem, as Adam points out, is that the unreachable statement below actually performs a security-critical function. By not having it called, an invalid certificate can be generated with an incorrect signature when certain types of cipher suites are used. Again, read Adam's detailed blog post on the issue.

Would Static Analysis Find This Issue?

Yes.

A successful analysis via cov-analyze, part of Coverity Quality Advisor, on the reduced code shows the UNREACHABLE checker firing. We mocked certain portions of the code so others could quickly compile and analyze the code snippet. However, the SSLVerifySignedServerKeyExchange function is structurally equivalent to the original code.

Pic or it didn't happen:



Since the code snippet is open source, it qualifies for Coverity Scan. Click here to view the project. Or if you're an existing Coverity customer, skip to the bottom of the blog for information on how to reproduce the above.

What Is the Defect?

The UNREACHABLE checker reports many instances of code blocks that cannot be executed because there is no possible control flow to reach it. These defects often occur because of missing braces, which results in unreachable code after break, continue, goto or return statements. In the affected code, the second subsequent goto fail; statement causes the control flow to always bypass the line below it, jumping to the fail: block.

Looking at the Linux Kernel, an open source project in Scan, the UNREACHABLE checker fired 112 times since the Linux kernel has been in Scan. In 2013, 33 of those defects were fixed in the kernel. And since it's been in Scan, 74 of these defects have been resolved. This is about 9 million lines of code. The UNREACHABLE checker is a good example of a high-quality checker in that even on such a large project, it's firing about once ever 80,000 lines. Noisy? Not at all.

Even still, developers may have hundreds to thousands of defects to fix in a large code base, especially one that hasn't had a static analysis tool ran on it before. So which ones should be fixed first?

How Can Developers Prioritize Such Defects?

Any defects found via static analysis in security component code should be treated as a guilty until proven innocent. These defects should take priority in triaging and remedying. This begs the question how will developers know what's a security component? Here's a great spot where developers and security engineers can work together.

Security engineers and application architects should work together to identify security sensitive components. Here's a short list:

  • Authentication: passwords, password resets, login / logout, session management.
  • Authorization: function and data access controls, roles, entitlement.
  • User account management.
  • Cryptographic code, such as TLS. :)
  • Common security controls, such as escapers, validators, etc.

We've blogged about quality defects occurring in security components before. MySQL also had a defect found in an authentication component, which resulted in the ability to authenticate with an incorrect password (CVE-2012-2122).

Out of all the code in an application that should be "Coverity Clean", security components come to mind. What would an UNREACHABLE do to your code base?

Showing My Math

If you're a Coverity customer, you can capture the above command with cov-build via Coverity Quality Advisor 7.0.3 since it includes clang support. You can analyze the above on older versions of Coverity Quality Advisor, such as 6.6.x. Just point to your gcc instead of the clang binary. The rest of the command should be the same.

Grab the code:

git clone https://github.com/coverity-scan/AAPL-Security-55471 AAPL-Security-55471
cd AAPL-Security-55471

Build the code, using cov-build to capture the results.

rm *.o
/Applications/cov-analysis-macosx-7.0.3/bin/cov-build --dir ../55471-ir env LANG=en_US.US-ASCII /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang -x c -arch x86_64 -std=gnu99 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk -c sslKeyExchange.c -o sslKeyExchange.o
Coverity Build Capture version 7.0.3 on Darwin 11.4.2 x86_64
Internal version numbers: 95ee7d28a9 p-fresno-push-17316.506

1 C/C++ compilation units (100%) are ready for analysis
The cov-build utility completed successfully.

And then analyze the code via cov-analyze:

/Applications/cov-analysis-macosx-7.0.3/bin/cov-analyze --dir ../55471-ir
Coverity Static Analysis version 7.0.3 on Darwin 11.4.2 x86_64
Internal version numbers: 95ee7d28a9 p-fresno-push-17316.506

Looking for translation units
|0----------25-----------50----------75---------100|
****************************************************
[STATUS] Computing links for 1 translation unit
|0----------25-----------50----------75---------100|
****************************************************
[STATUS] Computing virtual overrides
|0----------25-----------50----------75---------100|
****************************************************
[STATUS] Computing callgraph
|0----------25-----------50----------75---------100|
****************************************************
[STATUS] Topologically sorting 1 function
|0----------25-----------50----------75---------100|
****************************************************
[STATUS] Computing node costs
|0----------25-----------50----------75---------100|
****************************************************
[STATUS] Starting analysis run
|0----------25-----------50----------75---------100|
****************************************************
Analysis summary report:
------------------------
Files analyzed                 : 1
Total LoC input to cov-analyze : 3842
Functions analyzed             : 1
Paths analyzed                 : 17
Time taken by analysis         : 00:00:01
Defect occurrences found       : 1 UNREACHABLE

Comments!