Software Integrity Blog

 

Why dependencies matter for SAST

How do static analyzers manage code dependencies? There are many ways, but the best static analyzers take a hybrid approach to dependency analysis.

Why code dependencies matter for SAST

Introduction

SAST solutions are popular with both development and security teams. But they’re used in such different ways that historically it has been difficult to meet the needs of both constituent groups simultaneously. Fortunately, recent advances in dependency management make it possible to satisfy everybody.

Background

Security teams use SAST solutions to assess application risk rapidly. The primary focus of a security team is often “ease of use,” which usually translates into “time to results.” If the team must assess many applications, it may be infeasible to spend significant time on application-specific configuration. In an ideal world, one would just enter the URL of a revision control system, and the SAST solution would correctly group the various projects and provide analysis results with no human intervention.

On the other hand, development teams want to realize the efficiency gains of finding defects early in the software development life cycle (often before they even hit revision control). False negatives result in rework later in the SDLC, while false positives are a time-consuming nuisance. Therefore, it is almost always a worthwhile investment to perform some amount of application-specific configuration.

In addition to the first-party source code, the next most valuable artifact to a static analyzer is an accurate view of the application dependencies. Normally, these are trivially available to development teams, as they are necessary to build and deploy the application. However, obtaining accurate dependencies has historically been too time-consuming for security teams.

In this article, we examine the role of dependencies in SAST accuracy, as well as various approaches to handling them. Then we describe a solution that meets the needs of both development and security.

The role of code dependencies

As an illustration of the role of dependencies, consider the following simple Struts 2 code.

<!DOCTYPE html>
<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>
<%@ taglib prefix="s" uri="/struts-tags" %>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <title>Hello World!</title>
  </head>
  <body>
    <h2><s:property value="messageStore.message" /></h2>
  </body>
</html>

This code is derived from the tutorial at https://struts.apache.org/getting-started/hello-world-using-struts2.html#step-3—create-the-view-helloworldjsp except that I’ve made one small change: In my ActionSupport class, I’ve copied user-controlled data from the request to the messageStore.

public String execute() throws IOException {
    messageStore = new MyMessageStore() ;

    HttpServletRequest request = (HttpServletRequest) ActionContext.getContext()
        .get(ServletActionContext.HTTP_REQUEST);
    messageStore.  request.getParameter("userControlled"));

    return SUCCESS;
}

Is there a cross-site scripting (XSS) defect in the above code? It’s impossible to say without knowing what version of Struts 2 is being used. As of Struts 2.5.14, the above code is safe because the tags (by default) escape for HTML context. But if an older version of Struts 2 is in use, this code contains an XSS defect. Without that information, an analyzer can’t do any better than guessing. (In this case, assuming the code is safe is likely a smarter guess, but it’s still a guess.)

Dependency strategies

Now that we understand why code dependencies are important, it’s time to look at the various approaches that static analyzers can take.

Ignore dependencies

This is the easiest algorithm to implement. The analyzer ignores dependency information and simply uses guesswork. It’s terribly inaccurate, but it doesn’t require writing any code.

The analyzer vendor typically optimizes the guessing to either maximize true positives or minimize false positives, often with the goal of getting the maximum possible score on some well-known benchmark. Usually, this involves assuming all dependencies are at the latest version. In an enterprise deployment, this works worst when needed most, as the analyzer will miss large swaths of defects in the most poorly maintained and vulnerable applications.

Since finding dependencies can be technically challenging, a vendor may make the dubious claim that the analyzer implements this strategy for “ease of use.”

Use a built-in library of knowledge

This is the brute-force approach to handling code dependencies. The analyzer is simply pre-configured to “know” about certain things. Again, by way of illustration, what does the following method do?

org.apache.commons.io.IOUtils.closeQuietly(…)

A good guess (based on the name) is that it closes a resource. The developer of the static analysis solution could go out and download this class, confirm what it does, and then provide this information to the analyzer as built-in knowledge. This works but isn’t a scalable solution owing to the large (and growing) number of libraries out there.

If an analyzer were to employ only this strategy, it wouldn’t be very good. It would have a very poor defect detection rate and/or a uselessly high false-positive rate. That doesn’t mean, however, that this technique is without merit.

It’s an excellent mechanism for the most common and stable libraries, such as the standard Java runtime libraries. It has the advantage of being fast since the information is pre-calculated. And it also allows the analyzer to enforce an API contract even if the code is more relaxed. But it’s not a good general-purpose solution.

Analyze dependencies

If a good static analyzer wants to truly understand a dependency, there’s one surefire way to do so. The analyzer is good at analyzing code, and it can just apply its own analysis techniques to the dependency. This doesn’t work so well for languages like C/C++, where it’s hard to extract useful information from the compiled binary (there are entire solutions whose whole purpose is to try this). But it works great for languages like Java, where the dependencies tend to be in easily analyzed JARs, or JavaScript, where they come as source code that is indistinguishable from the first-party code other than some metadata location (like residing in the node_modules!).

As analysis algorithms go, this is clearly the superior option. But it relies on locating the dependencies, which are not always available to security teams. Also, this technique is valuable only in proportion to the strength of the underlying analyzer.

Take a hybrid approach

The two active techniques described above are not mutually exclusive but rather can be applied together. In this scenario, the analyzer uses built-in knowledge when it’s available. Otherwise, it examines the actual dependencies to understand their behavior. It should be clear that this is an ideal solution but that it leads back to the original problem of obtaining the dependencies.

The best static analyzers take a hybrid approach to code dependency analysis.

Obtaining dependencies

Since code dependencies are a valuable artifact for analysis, it’s ideal to provide them. Generally, in a continuous integration situation, they are trivially available. But in other use cases, it may be challenging to obtain them. Sometimes, the only viable way to do so is to run a build of the software, which may be overwhelmingly complex to set up. Fortunately, some middle-ground solutions can provide much better results than just ignoring dependencies, without needing a great level of effort.

Build execution capture

Building a piece of software requires that the dependencies be available, and the build system is the final arbiter of the “correct” set of dependencies. Therefore, capturing those dependencies during the build process is the most accurate method. Build capture guarantees that all code dependencies are available and that any version conflicts are correctly resolved. The most well-known and reliable way to implement this method is to attach to the actual build process and determine which artifacts are linked/packaged into the final assemblies. Static analyzers that focus on C/C++ have been doing this reliably for decades. While this is clearly the most accurate algorithm, it’s also the most difficult to implement, as it involves integrating with a build system, which in some cases requires unreasonable levels of effort.

Metadata analysis

Metadata analysis is another well-defined mechanism and is in common use by software composition analysis (SCA) tools. There is sufficient information embedded within the build scripts to determine and obtain the needed set of dependencies. Running the build scripts can sometimes be a daunting task. The scripts themselves may be written for a tool (such as Ant or Maven), but the actual execution may require additional plugins or even ancillary tools. As a much more approachable solution, the analyzer can examine the scripts’ metadata to figure out the dependencies. Although this approach is not as reliable as build capture, it doesn’t require any additional setup.

Failure

Occasionally, the effort needed to get proper dependency information is too great. In this case, an analyzer can fall back and operate like a lower-level analyzer that simply ignores dependencies. This will have all the drawbacks of ignoring dependencies but ensures that at least some results are generated. It isn’t ideal but might be the best solution in a difficult situation. Fortunately, this scenario isn’t common if the static analyzer vendor has put sufficient effort into obtaining code dependencies.

A hybrid approach to dependency analysis is often the best for development and security teams.

A solution

Given the importance of understanding dependencies, ignoring them is a poor choice. However, requiring a working build may create an insurmountable obstacle. By parsing project metadata and using a hybrid solution for understanding dependencies, an analyzer can generate the most accurate results without putting additional burden on users. If the analyzer can’t locate a dependency, the analysis relies only on the knowledge base. For development teams using SAST in a CI pipeline (where dependencies can be located easily), this approach yields the most accurate results. For security teams, this approach improves accuracy in many cases—with zero penalty in terms of ease of use. And allowing development and security to use the same SAST solutions reduces organizational friction and improves efficiency.

Learn more about SAST

 

More by this author