How does software security automation add value? Is your security tester simply a security checker handing over results from the software security robots?
Robots are the future and they inspire both fear and awe in humans. This tension is as apparent in software as it is in any other field.
When I was at the EuroSTAR 2015 testing conference recently, the topic of test automation came up a lot. Testers are constantly under pressure to perform more “automated testing.” Go to any event in the testing community, and you will find talks on better automation, automating in new domains, figuring out what should be automated, and polemics against too much reliance on automation. I want to talk about lessons the security community can learn from looking at how the testing community has approached automation.
The security field has long had this same ‘send in the robots’ attitude, with numerous tools and services offering to perform faster, more frequent, more comprehensive scanning for security issues. Security tools differentiate themselves on a few factors. Security tools vary in what they scan (e.g., code, binaries, networks, logs). Security tools also vary in how often they scan (continuously, with every build, nightly, on demand). Finally, tools vary in the level of simplicity and human interaction. Some are completely push-button, and others are semi-automated with lots of humans getting involved at various key points of the process.
Michael Bolton, James Bach, and other luminaries in software testing have long argued that there is a distinction to be made: “checking” versus “testing.” As we think about security, we need to remember this distinction. As they say in their definitive blog on the concept, “anyone who seeks excellence in his craft must struggle with the appropriate role of tools.”
If a robot can perform the experiment, evaluate the result, and conclude with any reasonable confidence, then it is not a “test.” It is a “check.” The robot is checking for the presence of something known and recognizable. Bach and Bolton give us this definition:
Checking is the process of making evaluations by applying algorithmic decision rules to specific observations of a product.
The security community has a long history of algorithmic recognition of bad stuff. At the operating system and network levels, antivirus, antimalware, intrusion detection, and digital loss prevention (DLP) all have algorithmic decision rules. Once we get to the application layer, we are looking for SQL injection in code, cross-site scripting in the app, client-side authentication, and insecure platform configurations. There is much to check for, and we are increasing the number, variety, and sophistication of the checkers that we use in security.
Security “testing” is much more than security “checking.” So what, then, is “testing”? Again, Bach and Bolton have a good definition:
Testing is the process of evaluating a product by learning about it through exploration and experimentation, which includes to some degree: questioning, study, modeling, observation, inference, etc.
For security, this same conclusion holds true. We assume that every security tester has all the robotic checkers: Qualys, Burp Suite, Checkmarx, Sqlmap, or whatever tool du jour is suitable for the job at hand. The tools that are used do not distinguish one security tester from another.
Interpretation separates poor security testers from excellent security testers.
Checking tools might turn in lots of valid results, but most of the time they do a rubbish job of describing practical impact. Tools are necessarily generic, and they can’t know the business context of the software they are checking. Some impacts are more or less important based on the actual context of the software. Humans interpret. They add this value to a tool check, and I don’t see that changing any time soon.
Just like impact, tools are necessarily generic with their likelihood statements. How readily available are the tools for exploiting this finding? How likely is it that the exploit works? What level of access is needed to do the exploiting? When we write generic likelihood statements, they usually sound tone deaf in the context of a specific application. This application has only six authorized users. That application is on a closed network. This server actually mounts the file system read-only so data can’t be persisted that way. As with impact, humans must interpret the likelihood of risks in the context of the actual software at hand.
Note that impact and likelihood are the two components of risk. The quality of the understanding of risk comes directly from the quality and relevance of its impact and likelihood. Is your security tester adding value, or are they a security checker handing over results from the robots without adding value?