This is part one of a two part series of posts by myself and Bryan Sullivan; I will cover the static analysis tools we use at Microsoft (and make available publicly) for analyzing unmanaged (ie; Native) C and C++ code, and Bryan will cover managed code static analysis in a later post.
I’m a huge fan of static analysis tools; actually, I’m a fan of any tooling that beneficially automates any portion of the software development process. Software development is a complex business, and anything you can do to make the process more repeatable, predictable and reduces ‘friction’ is a big win in my book.
There are many benefits to using static analysis tools. The most important reasons include:
- Static analysis tools can scale: they can review a great deal of code very quickly; this is something humans cannot do very well.
- Static analysis tools don’t get tired. A static analysis tool running for four straight hours at 2:00 in the morning is just as effective as if it runs during business hours. You can’t say the same thing about human reviewers!
- Static analysis tools help developers learn about security vulnerabilities. Over the years, I’ve met a small number of developers who had bugs flagged in their code by static analysis tools, and they never knew what the bugs were until the tool posted a sign saying, “Security bug, right here!”
Before I dive into static analysis tools in detail, it’s worthwhile explaining what ‘static analysis’ is. Static analysis is a method of analyzing program code without actually running the code. Generally, the tool will build an internal model of the code and analyze potential program flow through the code making assumptions about the data. For example, the following code may or may not be a real vulnerability:
foo[i] = 0;
because it depends on the value of ‘i’; if ‘i’ is in the range 0..3 and can only ever be in the range 0..3 then there is no security vulnerability, so the static analysis tool has to determine if this condition is possible. Clearly, it’s simple to determine that the following code is safe, because the index is constrained right next to the code that writes to the array:
if (i>=0 && i<=3) foo[i] = 0;
But things get more complex if the index is validated in remote parts of the code. It is this level of analysis that determines if a tool is noisy: a tool that flags too many issues (false positives) because it missed a validity check will rapidly annoy a developer.
I want to point out that static analysis is not grep, static analysis tends to be more robust. That does not mean grep is not useful, for example, if you have a set of banned functionality such as banning MD4 and MD5 (as the SDL does, along with other crypto algorithms) then grep’ing for MD4 and MD5 is totally valid, probably low noise and requires next to zero engineering effort.
I also want to point out that the SDL focuses on using static analysis tools to find security vulnerabilities. Under the SDL umbrella, we would not require development teams use static analysis tools that didn’t find security issues. A tool that does not find security bugs is not a useless tool; we just would not make it an SDL requirement.
It’s important to point out that static analysis tools work in tandem with human code reviewing experts. Tools tend to find a lot of bugs quickly, but expert code reviewers are better at finding a smaller number of hard-to-find security bugs. I wrote an article for IEEE Security & Privacy a few years back describing the methods I use to review code for security bugs.
Static analysis tools have been used for many years at Microsoft. We started in earnest with a tool named PREfix when we acquired Intrinsa. PREfix is aimed at finding general code quality bugs in C and C++ and has proven very effective over the years. The main downside to PREfix is it is big, and generally is run centrally rather than each developers’ desktop. So PREfix begat PREfast, a smaller desktop version of PREfix. PREfast has the advantage of being relatively quick to run (it only doubles compile times!) but it suffers from only being intra-procedural; in other words, its view of your code is very small, while PREfix is inter-procedural and can evaluate conditions in far-flung regions of your code. If you need to know why that’s important, refer to the example code above!
PREfix and PREfast both support the Standard Annotation Language (SAL) which I have addressed a couple of times in the past. SAL allows you to describe function contract semantics to help tools like PREfix and PREfast find more security bugs. SAL is used throughout Visual C++.
PREfast is available in Visual C++ today as the /analyze option, it’s also freely available in the Windows Device Driver Kit (as prefast.exe) and Software Development Kit (as /analyze).
What You Should Do
If you write native C or C++ code, you should:
- Compile at least once a day with /analyze
- Use SAL to annotate your function prototypes, this will help the static analysis functionality in the compiler find many more bugs.
The following warnings should be analyzed, as they are probably security issues:
Finally, for extra credit, look for the following warnings that are generated by the compiler and not by the static analysis tools:
Both of these relate to uninitialized data and to enable these warnings either compile with warning level 4 (/W4), or if you’re not daring enough, use /W3 augmented with the following:
/W3 /WX /we4701/we4700
The SDL Optimization Model and Static Analysis
If you’re following the SDL Optimization model, use of static analysis tools is deemed a requirement for the ‘Advanced’ maturity level.
In summary, the SDL mandates static analysis tools for C and C++ code. If you are currently not using static analysis tools in your development environment, you should. If you’ve never run static analysis tools then the chances are good you’ll find some ‘interesting’ bugs!