Oh No! Security Metrics!

A colleague sent me a link to a blog post from a couple of days ago: Pete Lindstrom of Burton Group blogged that Microsoft’s SDL has Saved the World!! raising concerns about Microsoft using vulnerability counts as a means to measure security improvement resulting from the SDL.

I’ve raised this topic before, in my blog post The First Step on the Road to More Secure Software is admitting you have a Problem. Here are two pertinent quotes from that blog post of Feb 21st:

“Let’s face it, no-one can agree on any measurement of security without getting knotted up.”

“Measuring security is a real challenge, and while we may debate the merits of vulnerability counts, right now it’s the only concrete metric we have.”

These comments are very important because there appears to be no more widely accepted security metric today, and while no perfect metrics exist, it’s useful to have some objective data when trying to discuss this complex subject. Our customers constantly tell us to reduce the number of patches they need to apply to their products once in deployment. It costs them time and money to deploy security updates. The primary metric that matters to customers is the number of security updates they need to apply.  And the only way to reduce the number of  updates is to systematically reduce the number and severity of vulnerabilities in the code in the first place – that’s the goal of the SDL.

In my mind there are two kinds of vulnerability metrics, and I alluded to both in my prior blog post. The first vulnerability metric compares Microsoft to Microsoft, in other words we compare Windows XP to Windows Vista, SQL Server 2000 to SQL Server 2005 and so on. We use this metric while a product is being built; we track incoming security bugs for the prior version of the product to see how we’re faring with the current version in development. Fewer vulnerabilities in the product under development is a good sign that the product will fare better in the real world.

The second vulnerability metric compares Microsoft with other vendors. This is an interesting metric, but our group is full of engineers, so we pay little attention to the figures because there is very little we can influence.  The post Mr. Lindstrom refers to cited those vulnerability figures I use to point out that other development organizations need to admit they have a secure development problem.  Looking back at the figures we cited, it’s pretty clear that the sheer volume of security vulnerabilities supports our assertion, regardless of the subtleties of security metrics.  More about this below.

Mr. Lindstrom states:

“Microsoft has systematically hired and/or contracted with every one of their most vocal critics (and most seasoned bugfinders) to do the work behind the scenes and they don’t count those vulns!”  

But in making this assertion, he’s saying the vulnerabilities we remove (and do not add to the code in the first place) as part of the SDL process should be counted as though they were part of the product after we shipped it. We don’t count vulnerabilities that don’t affect customers, regardless of the vendor.

We hire some security researchers to be part of our teams executing the SDL because they’re among the best and brightest at performing component design reviews, code reviews, black box testing and other security procedures needed to make our products more secure.  Everyone in the industry covets their expertise because it’s in short supply, and so we’ve competed to bring in the most capable people – as employees, contractors and advisors. These experts, helping us execute the SDL, have helped Microsoft eliminate vulnerabilities before our products ship, which naturally means lower vulnerability counts and improved security for our customers. In addition, bringing in researchers helps us to better understand what the community is thinking about today so we can anticipate and head off the problems of tomorrow.

To put it bluntly, we hire security researchers to help protect customers. Period.

While we’ve made an effort to hire the right researchers to help us improve the security of our products, it’s far from the case that we’ve hired “every one of their [our] most vocal critics.”  There are still plenty of security researchers who are looking at our products and reporting the vulnerabilities they find after the products ship.

We have found that the training and principles of SDL have indeed significantly improved the products Microsoft engineers create. You improve security by expending effort on improving security. We have seen the evidence of this in the fewer customer updates being released against that code. When applied correctly, the SDL development principles prevent vulnerabilities from entering the final code in the first place. This last point is very, very important: you can’t count a bug that was never created; the goal of the SDL is to not create the bugs in the first place.

Some of the many SDL principles that reduce or mitigate security bugs include:

  • Mandatory education (Net effect: fewer security bugs up front)
  • Design decisions based on threat models (Net effect: fewer security design bugs up front)
  • Cannot use known insecure APIs (Net effect: fewer security bugs up front)
  • Use of static analysis tools (Net effect: fewer security bugs enter the central source code repository)
  • Cannot use known weak crypto primitives and key lengths (Net effect: fewer security bugs up front)
  • Compiler and linker requirements (Net effect: extra defenses, in case you miss a bug)
  • Fuzz testing (Net effect: implementation bugs found before shipping)

So, to answer Mr. Lindstrom’s question:

“Could it really be that SDL has done nothing to help MS developers write better code?”

Without a doubt, the SDL has helped Microsoft developers write better and more secure code.

However, we are still faced with the question whether vulnerability-based metrics are a valid way to measure progress of the SDL. In my opinion, vulnerability counts are a useful metric, but imperfect. We’d welcome Mr. Lindstrom’s (and anyone else in the security community) sharing with us the metrics they would use to measure security-related success and how to calculate them. While the notion of what constitutes a “real, objective metric” is often based on individual preference, I think both the efficacy of SDL and the industry as a whole would benefit from this discussion.

Interestingly, Mr. Lindstrom has at times pointed to vulnerability counts as an interesting (but not perfect) metric.

One final comment: If the Microsoft product security vulnerability trend was in the other direction, up and not down, would industry observers claim SDL is failing? I think so.

The SDL works; it’s not perfect, we’ve never said it is, but it’s making our customers happier because they have fewer security updates to apply. Not zero, but fewer. And we are always looking for ways to improve how we measure the progress of SDL.

As we’ve been saying all along, industry dialogue is key – so let us know what you think.

About the Author
Michael Howard

Principal Security Program Manager

Michael Howard is a principal security program manager on the Trustworthy Computing (TwC) Security team at Microsoft, where he is responsible for managing secure design, programming, and testing techniques across the company. Michael is an architect of the Security Development Read more »

Join the conversation

  1. asteingruebl


    To the outsider however its not clear where the defect reduction is coming from.  What we’d love to have some insight into is whether you’re measuring defects at multiple stages of the development lifecycle and seeing reductions throughout.

    Part of this goes to the question of efficiency.  You could for example simply test the hell out of things, hire better testers, etc.  If you didn’t do developer education, didn’t do threat modeling, and still got a massive reduction in the end vulnerability count, you’d still have an effective process and it would show up in fewer patches.

    If you don’t have any metrics about the effectiveness of training, threat modeling, etc. and you can’t track where defects are being created, and discovered/prevented, then its hard to know which parts of the process are working and which aren’t.  

    This isn’t to say that MS needs to share all of its internal metrics, tracking, etc.  But, it does point to vuln counts as not at all indicating that the SDL is working, but maybe that some part of it is working.  

    I think you’re positioned to pull together some very interesting metrics because of your diversity.  Things like defect counts of a given type per programming language and/or dev environment.  Details on the percentage of design vs. implementation defects, and when they are being discovered.  

    Do you have anyone sitting around doing nothing who wants to just work on publishing metrics and such for the rest of us to consume? 🙂

  2. Patrick_Boyd

    I really wouldn’t worry about Mr. Lindstrom’s criticism very much.

    Are publicly disclosed vulnerability count a perfect metric of security? Of course not.

    But is it the best metric that we currently have access to? I think so, and obviously Microsoft does too.

    Until Mr. Lindstrom can suggest a better metric, or someone else can. I would stay the course and keep doing your best to secure the OS and tools that most of us run.

  3. bryansowen

    Perhaps a review of relative attack surface quotient would offer insight about the effectiveness of early SDL stages.

    Specifically include Windows 2008 and Server Core to the charts from 2003 "Measuring Relative Attack Surfaces" by Howard, Pincus, Wing.

  4. sdl


    We do measure attack surface, it’s a critical part of a product’s security and one facet of the SDL that has nothing to do with code security. In general, we’ve driven the attack surface down substantially from Win2000 and Windows XP. The good news is you can decide on your metrics and measure it for yourself 🙂

Comments are closed.