Fault Injection is a podcast from Synopsys that digs into software quality and security issues. This week, hosts Robert Vamosi, CISSP and Security Strategist at Synopsys, and Chris Clark, Principal Security Engineer at Synopsys, go into detail about the value of fuzz testing and the findings from a new report from Synopsys on the State of Fuzzing 2017.
You can always join the discussion by sending us an email at firstname.lastname@example.org.
Robert Vamosi: I’m Robert Vamosi, CISSP and Security Strategist here at Synopsys.
Chris Clark: I’m Chris Clarke, Principal Security Engineer here at Synopsys.
Robert: Welcome to Fault Injection. We’re continuing our discussion about software supply chain, but we’re going to drill a little deeper into a type of testing that you might want to do with your software supply chain, and that is fuzz testing. What the heck does that mean? Fuzz?
Chris: [laughs] Fuzz, well, that’s a question we get quite often when we talk about fuzz testing. Really, fuzz testing is just a way of injecting malformed input or bad input into some type of software interface and seeing how it behaves.
Robert: You just said malformed. Fuzz testing can also be referred to as negative input testing or malformed input. There’s a variety of terms. Perhaps you’ve heard those terms, but now we’re going to use and standardize on the term ‘fuzz testing’ for the rest of the podcast.
Chris: Excellent. It’s the right term. [laughs]
Robert: You’ve had some experience with fuzz testing?
Chris: Many years.
Robert: Why don’t you explain why you would ever do something like that?
Chris: The premise of fuzz testing is, I want to make sure my application behaves the way it should. To validate that, no matter how many other forms of testing I’ve done, is to start throwing input at it.
We expect users to behave a certain way. We expect them to do certain things. At the end of the day, most users don’t do that. The same can be said about software. Software crashes all the time.
If I have a piece of software that has a specific function, and maybe it expects to get a zero, what happens if I send an A, or I send a three, or I send unit code, or some other type of input from what it’s expecting? How is it going to handle that? That’s really what we’re looking to achieve.
Why do we want to do that? This is where it really gets interesting, because that’s what your attackers are going to do. If you talk to any security professional, one of the things that they’re going to do is fuzz testing, because they want to see how good you’ve written your code.
Robert: I know there’s a difference, but I’m going to bring it up because someone in the audience might ask the question. And that is, how is this different from an SQL injection?
Chris: [laughs] SQL injection is a very well-structured language. When I’m injecting data into SQL, I have to follow some very specific formats in order to get data into the system. In a way, I could manipulate fields within an SQL message and inject those.
If I’m trying to do a zero equal zero injection, or some other form of injection, that could be considered a form of fuzz testing. But really, I’m manipulating the language to behave in a manner that I want it to, because there’s not filtering.
If I’m doing fuzz testing, though, I really don’t care. All the gloves come off. And, it’s not really a specific protocol. When we talk about fuzz testing, there’s many kinds of fuzz testing, we just happen to be talking about protocol fuzz testing right now.
If I was doing API testing, and if I was performing in application fuzz testing, there’s so many other structures and types of data that could be injected and manipulated that it’s a very different form than just performing a SQL injection.
Robert: Let’s step back a little bit. We’re talking about an application that is compiled and can be run, or can we do this earlier in the software development life cycle?
Chris: It depends on the type of fuzz testing we’re doing. Yes, there are types of fuzz testing like American fuzzy lopper, where I have my application that’s close to release. I want to test it from a fuzz testing perspective, and I’m going to recompile it with a library that walks down each execution track of the application—that’s one form.
The other form is, I may have a network based application that uses different protocols. Obviously, when we’re talking about most protocols, we’re going to talk about TCP/IP, or all the base protocols. But we could be talking about anything that’s IoT-specific, like MQTT. Or, maybe even industrial control specific, like Modbus, or some other type of industrial control protocol.
We’re going to test that closer to the release of that particular product because I need to have a functioning application. There’s a wide range of ways we can apply fuzz testing, depending on where we are in the software testing life cycle process.
Robert: You’ve suggested that there’s multiple ways that you can go about this. There are some generally accepted ways of doing fuzz testing, so what are some of the methods that are out there?
Chris: From a traditional fuzz testing perspective, there are some specific types. Most everybody is familiar with random fuzz testing. That’s where, I have a protocol or some type of interface that’s accepting data.
I don’t really know the structure, but I’m going to start throwing data at it and see how it behaves. That’s really not the most effective, but it really does work, but it takes a long time. Since I don’t know what that protocol structure looks like, I have to default to that type of mode and just hope that I’m going to case some kind of fault condition.
Robert: …but you could miss something.
Chris: …but you can miss something, you can miss a lot of things, really. The next type of fuzz testing is what we would call template-based fuzz testing. That’s where I’ve captured a message, and this is very prevalent. I know the structure of the protocol that I’m communicating with, but I’ve captured some information.
Based off of what I’ve captured, maybe I’ve tried to capture a complete message now I can see how that message is framed, and I can start playing that back with malformed input into the device to see how it reacts.
Again, we could still miss some things because we’re only looking at one specific message. We don’t know what the variations are for this particular message, and we don’t know what the other types of messages are. That’s where the most powerful type of fuzz testing comes in, and that’s generational.
Generational, basically, has a state engine that understands everything about this particular protocol. It knows how to communicate, it knows how to build sessions, tear down sessions. If any types of key or session management is applied, it knows how to manage that completely through the process.
What does that really mean? What that means is, from random, I’m just kind of throwing things I’m hoping. From template-based, I’m throwing things but I’m following a very specific path, because this is the only message I have.
When I look at generational, I can look at everything and I can test each variation or each specific area of that particular protocol and be much more effective in my testing. I can reduce time, and ideally, find more vulnerabilities or more weaknesses in a shorter period.
Robert: You mentioned time a few times, and what’s important here is that you have it go through as many iterations as possible, but how many is enough?
Chris: That’s an age old question. Fuzz testing has been around a while—everybody asks that. I think one thing we have to acknowledge when we talk about this right off the bat is, fuzz testing is an infinite space problem. If I have one input, I can change that input from now until the end of time and not repeat that input.
It’ll create a lot of issues a lot of time, but how relevant is that? A lot of those cases are going to be close or very similar. Whereas, if I look at the protocol and I understand how it’s structured, I can test overflow and underflow boundaries, I can test different types of input to see how it reacts.
I reduce that infinite space problem to a more manageable time frame. Still it can take a long time, but we’re going to be much more effective with our time and our usage.
Robert: Earlier, I asked about why you would do it, and I want to return to that. Heartbleed is an example of a vulnerability that was found by Codenomicon, now Synopsys. There was an open source protocol out there, an implementation of heartbeat in the OpenSSL protocol, that was flawed. And, it was out there for two years.
The old saying about open source, about many eyeballs looking at it and so forth, could not have found this—it needed some sort of automation. We just happened to be testing OpenSSL, we noticed that it was crashing, and we went back and we looked at it. And that’s how we discovered the overflow capability because of this failure in implementation.
Chris: Minor point of clarification. We actually noticed we were getting data back, and we were getting data back in a volume that we specified, not what the protocol was supposed to specify. When we started seeing that, we realized, “Well, wait a minute. What is this data we’re getting back? What is this extra information?” That’s where we realized we had a data leakage issue.
Robert: By running these tests, we got back a volume of data, we were able to look at it, and we were seeing these anomalous results. And, by drilling down into those anomalous results, we noticed that it was echoing back what we had asked.
Robert: That became the basis for the vulnerability, which was fixed, and OpenSSL has addressed that.
Chris: It’s also important to point out, and you mentioned this earlier, that that code had been around a while. It had been looked at by other types of testing technologies that were out there, but because of how it was implemented, it just didn’t manifest itself. That’s what makes fuzz testing so powerful.
When we look at the test parameters for how a product should be tested or how software should be tested, those parameters are only as good as what we define. When we look at it from another aspect, a different type of testing, we can find where we have potential weaknesses or potential…maybe not weaknesses, but areas we didn’t cover as extensively as we should have.
Robert: It’s still a two part process.
Robert: You have this automation, where it’s running these tests over and over and over. And then you start finding time to failure, start finding failures. But even then, you’ve got to go back and look at that manually, and decide, is this a critical crash? Is there a reason why it fell apart at this point and crashed the program, or is it something that we can live with and move on?
There are some false positives in there, but pretty quickly, you find the real vulnerabilities.
Chris: Some real meat of what we’re doing. You mentioned something—time to failure—and that is an important aspect to consider. When we look at product that is, maybe it’s never been fuzz tested before, and we start to perform fuzz testing on this particular product, that time to failure is how long does it takes for us to find our first failure.
Robert: Time to first failure?
Chris: Right. Then, when we first start seeing that failure and we start looking at additional failures, what’s that density?
If we’re in a relatively short period of time and we see a high density of failures, we know that this code isn’t necessarily as well vetted as it should be, or the quality of the code isn’t really where it needs to be. That helps us address why we need to implement fuzz testing and why it can be so powerful.
Robert: That is something that we at Synopsys have been looking at, and we are releasing a report on the state of fuzz testing, 2017. We were able to draw upon the benchmarking data that we have from our labs, using our tool, Defensics, and one of the things that we do call out is the time to first failure.
We were able to look at the individual protocols and I don’t think it surprises you too much that some of the more mature protocols, such as GIF, an image format, is relatively…what would you say, robust, in the sense?
Chris: Robust, yeah, very.
Robert: You have to fuzz it, on average, of seven hours before you start to see a first failure. There’s no low hanging fruit, in other words.
On the other hand, there are some newer protocols that are being tested. MQTT, which you called out, which is used in the Internet of Things, and for that you were talking about a matter of minutes before you start seeing the time to first failure.
Chris: Anytime we’re going to look at a burgeoning industry—in previous podcasts we talked about V2V, we talked about medical, we’ve talked about IoT. When we start developing these standards, especially when they’re not well viewed and not well vetted by many people, that first time to failure is typically going to be pretty high, just as you described, and that density of failure is going to be pretty high.
There’s ways to address that, of course. We can look at classifying these failures and look at specific areas that may require more review or more correction from a development perspective, but now you’re armed with the information. And this is the type of information that attackers are going to be using against you, as a software developer. It’s very robust information, it’s usable.
The thing that’s really important about this, to consider from fuzz testing, if I cause a crash, that’s a crash. That is something I have to take some level of action on. You can classify it however you want, but you have seen a very specific real event.
Other kinds of conditions can also be pronounced. I may see a resource utilization issue, I may see a out of state condition, any number of scenarios that fuzz testing can prevent, but as you said, it takes a little bit of additional review to figure out how critical is that particular item. They’re real, and you have something tangible.
Robert: I’m going to use the word, mature, this seems like a pretty mature testing technique. Typically, who are we seeing adopt fuzz testing in their software development life cycle?
Chris: Most of the times where we’re seeing fuzz testing adopted, is an organization that has started to develop a security policy, a robust software development life cycle. They’re starting to look at, “What are the tools that we need to implement our security requirements that we’ve developed.”
Not necessarily your startups—some startups are much more progressive—but typically, this is going to be a little bit more mature organization that’s been around a little bit, and they’re looking at security and they’re looking at it from a meaningful standpoint.
A byproduct to that, and you asked this earlier is, how much testing is enough. That is a very difficult question. When we talk about what is the application of this test target whatever that target is, whether it’s software or an embedded device you have to take some things into account. What are the quality levels, what are the safety implications, and what are the security requirements for this particular product?
Those are going to feed how much testing should be done. One feature, at least from our tools, that’s beneficial is we have a way to look at what is the industry doing for this type of protocol that’s being used, how it’s implemented. This tends to be average for testing, so that helps in that respect.
One thing I’ll point out is, we have a document that’s called “Fuzz Testing Maturity Model,” or FTMM for short. What the FTMM really provides you with are levels of testing that you can follow on a regular basis. And, apply those to the type of product, or the type of environment, that you’re going to be deploying your product in.
That gives you a starting point of figuring out how much fuzz testing is relevant, how much have I done before. I can move on to that next step, or that next iteration.
Robert: We also have on our site over 250 protocol suites that allow you to see what we can test for and you can find the ones that are appropriate for your industry.
Chris: Very much so. I do want to point out, fuzz testing, we’re talking about protocols, but as I said earlier, there are many different types of fuzz testing. We highly recommend that there’s more than one tool in the tool chest. You should have multiple tools, testing at each stage, because each one of them has very specific characteristics, that can find certain things.
If you’re implementing those throughout the time cycle, utilizing automation capabilities to really get the most out of that tool. It can be very effective in a relatively short period.
Robert: As I mentioned, we have a new report out on the state of fuzz testing in 2017. We also look at different verticals, and we show which protocols have emerged in 2016 over 2015, as being the more popular protocols that are being tested.
If you think that you’re covering your industry, you might want to check it out to see if there are any protocols that you’re currently missing that your colleagues in other companies are testing. It’s interesting to compare.
Chris: Yeah. It’s always surprising for us, when we go and when we talk with the customer, we ask what protocols they use, we get a list of protocols, and we say, “OK, let’s validate that, let’s check that.” We’ll use something like an In Map or some other protocol, or other tool to look at the protocols that are being listed, and we always find there’s some extras that aren’t included.
If that protocol is exposed in any way, shape, or form, whether it was intentional or it’s disabled, or it’s supposed to be in a private network, it should be tested. These are the types of testing methods that help you ensure as an organization, you’re as mature as possibly can be from your security methodologies and testing standpoints.
Robert: Chris, another great episode. [laughter] Appreciate that. Once again, if you have any comments or anything, please fill out the comment space, or take advantage of our email address, we’d love to hear from you. We’d like this to be dialogue, and as I’ve said over and over, security is subjective. I want to hear your point of view.
Chris: Definitely. Thank you for attending again. If you are using fuzzing already, we’d love to hear from you, what you’re using, how you’re applying that. As always, we look forward to seeing you at our next podcast.