Posted by John Steven on May 3, 2011
We’ve probably all experienced organizations that rely principally on a single assessment technique (whether it be static analysis or dynamic analysis, manual or tool-based). Unfortunately, this is all too common for security practices. When this topic came up recently with the question (paraphrased), “Are there numbers that demonstrate the value of a security program making use of static, dynamic, and manual assessment techniques?” I thought some of our experience might apply.
Synopsys favors delivery of holistic assessments (those assessments that consider architecture/design as well as code, leveraging static & dynamic tools, and using both tool-assisted and manual techniques). During my tenure as Director of Security, we found the following distribution of findings:
Those numbers may surprise and demand some explanation. For instance, though Microsoft at one point reported 50% of their findings were “flaws”, we found 60% (on average, with as much as 70%) of its issues as ‘flaws’. If anything, our classification of ‘flaw’ was less inclusive though, and at the time I looked into this, I concluded that our larger percentage could be due to technique, focus on architecture in methodology (LoE), or level of customer familiarity/maturity with the tech-stack they developed on.
Likewise, our process started with a “whitebox” triage then shifted to exploratory testing (aka: dynamic) rather than the other way around (which I find common elsewhere) and that order-of-operations probably moved balance from dynamic to static. So, in practice we “found” vulnerabilities statically and then supported those findings with dynamic tests to validate their exploit-ability and impact (though perhaps dissatisfying, static gets the point in our data set here).
Our experience made it pretty evident to us: skip assessment techniques and expect to miss some critical vulnerabilities.
As I’ve mentioned in previous blog entries and mailing list posts, security managers need the capability to assess and report on their ability to find particular classes of security vulnerability as well. WASC’s Web Application Security Statistics provide a starting point for both classes of attacks you might expect to find and their probability. There’s no shortage of vulnerability taxonomy work out there, so we’ll leave this topic at this superficial level.
Whether you know what percentage of your reported findings come from each assessment technique, or you’re just starting to build your assessment practice, optimization (measurement, analysis, iterative improvement) should be a goal. To do this we combine our knowledge of the techniques we’re employing and the enumeration of vulnerabilities we want to find and report.
Confined to the context of static analysis, I described the following loop in a previous blog post to do just that:
The purpose of this trial experimentation was to provide some basic comparison between assessment techniques (though you could easily come up with a more thorough approach). Of course, you’d want to consider findings from the perspective of all techniques, not just static tools.
Internally, we had our assessment lab managers write up a qualitative comparison of assessment techniques we employ, comparing how well each is “suited” to finding vulnerabilities against common vulnerability taxonomies (*2) and came up with the following:
At this point, the reader may be asking why my experience differs from our lab managers’. Specifically, for instance, “why are dynamic techniques over-estimated here as compared to to historical findings rates?” Two reasons (at least) come to mind:
Whether your considering a vendor’s assessment service, an assessment standard, or your own internal assessment practice, it’s key to be able to detect biases. When others supply you assessment data you’ll likely uncover biases (though it may not be apparent to those reporting initially). Be careful, I doubt you’ll be able to compile, combine, or relate numbers you receive from varying sources for this reason.
If you’re compiling assessment data for the purpose of motivating practice improvement, you may want to either 1) extract universal similarities from the data you get and present those lessons learned to management, or 2) just list the others’ experience separately. This can provide those to whom you’re presenting a compelling view to your perspective on differing assessment approaches.
In fact, many factors will muck up assessment comparison. For instance, when people report a percentage (like I did above), did they mean finding instances, findings, by type, or something else (*3)? What was their methodology, level-of-effort, and how did they conduct assessment. For instance, risk-based assessments produce a very different distribution (with wider variation) than time-boxed techniques, which more frequently drive to an explicit (or worse, implicit) checklist.
One thing universally seems true though: the more assessment techniques a software security practice makes use of, the more types of vulnerabilities it can expect to report.
(*1) – Clarification: our use of dynamic tools is human-assisted (both in terms of crawl and in follow-up with manual effort or leveraging internally developed tools).
(*2) – WASC, OWASP Top 10, CWE Top 25
(*3) – My data is comprised of reported vulnerabilities. Answering questions about what risk-level I reported requires some abstraction as clients differ in their requirements there. Likewise, behavior regarding instances vs. buckets of ‘related vulns’ requires addl. explanation.