Black Duck and GenAI

Mar 04, 2024 / 6 min read

Table of Contents

AI-generated code: The fourth component of software
The mistaken presumption that AI produces clean code
Overview
The Role of AST in AI
A lesson from recent history
Rising to evolving challenges
Summary

AI-generated code: The fourth component of software

There is enormous attention on generative AI (GenAI) and its potential to change software development. While the full impact of GenAI is yet to be known, organizations are eagerly vetting the technology and separating the hype from the real, pragmatic benefits. In parallel, software security professionals are closely watching the practical impact of GenAI and how application security testing (AST) must adapt as adoption increases.

Until the advent of GenAI, software was composed of three types of components.

The code you wrote.
The code you bought.
The code you used from open source.

As organizations consider using GenAI coding assistants, the most prudent position is to view AI-generated code as a fourth type of component, with its own benefits and risks.

The mistaken presumption that AI produces clean code

GenAI uses trained, deep-learning large language models (LLMs) built from massive amounts of code collected from internet websites, forums, repositories, and open source projects. GenAI tools like ChatGPT and Copilot utilize those LLMs to translate a human-like command into code. As the hype for GenAI grew, there was a growing presumption that the code used to build the LLMs would be free from licensing and vulnerability issues, and therefore the LLMs would produce code free of bugs and flaws.

In fact, the opposite is true: Studies such as the “Open Source Security and Risk Analysis” (OSSRA) report show that codebases contain numerous vulnerabilities and licensing issues, with the 2024 edition reporting vulnerabilities in 84% of scanned codebases and 53% with licensing conflicts.

If GenAI tools are learning from existing codebases—like those scanned in the OSSRA report—it is highly likely these tools will bring these problems into generated code. Furthermore, technology advancements are quickly followed by people who look to exploit new weaknesses, and tradecraft to contaminate LLMs has already surfaced. Organizations should not presume that GenAI coding assistants will produce pristine code free of risk. It must be tested like any other code.

Overview

AI-generated code (and AI coding assistants) will revolutionize software development, becoming the fourth major component of software, alongside proprietary, third-party commercial, and open source components.

However, since the LLMs powering AI coding assistants are trained on publicly available software (including open source software), organizations can't assume that AI-generated code is perfect. It can inherit the security and quality issues present in the code it was trained on, and result in license violations and potential IP risks when it is copied from open source.

As in the early days of open source, fear of these risks is slowing the adopt of AI- generated code and preventing organizations from realizing its full potential. Black Duck helps organizations realize the benefits of AI-generated code while managing the risks.

Black Duck® SCA snippet analysis identifies potential license compliance and IP risks.
Coverity static analysis helps teams find and fix security and quality defects in AI-generated code.
Other Black Duck application security testing solutions help teams ensure they are building and delivering secure software their customers and users can trust.

The Role of AST in AI

The fundamental truth is that all code has flaws and bugs, and using GenAI will not change that. GenAI and AST are not mutually exclusive, and AST is a necessary enabling agent for AI adoption. The essential three testing methodologies (static analysis, dynamic analysis, and SCA) will remain critical to monitor the security and quality of software. Organizations must use a multifaceted testing approach to find and fix issues in a timely and efficient manner.

In a recent publication, “Predicts 2024: AI & Cybersecurity—Turning Disruption into an Opportunity,” Gartner predicts growing adoption of GenAI, but with several caveats. The hype around eliminating the need for AST solutions gets immediately debunked, as the document notes that “through 2025, generative AI will cause a spike of cybersecurity resources required to secure it, causing more than a 15% incremental spend on application and data security.”

Certainly, AST best practices and deployment methods will need to evolve. Organizations see GenAI as another method to increase development velocity. But to realize that benefit, organizations will need automated AST solutions that are integrated into development workflows and can scale with software development efforts and the potential for higher volumes of code.

A lesson from recent history

At Black Duck, we view GenAI as the next evolutionary step on the AST journey, and history shows that AST can enable organizations looking to gain the benefits promised by new technology. A parallel can be drawn to the early days of open source software (OSS), when organizations were reluctant to accept the perceived risks of broad open source usage. Fast forward to today—most applications are composed of 77% or more open source software.

As OSS began to proliferate, organizations struggled to manage it, track dependencies, and identify potential vulnerabilities. Early adoption of OSS by enterprises was primarily hindered by concerns of licensing and IP protection, with royalty obligations and other licensing issues creating risk. Black Duck was launched to address these early concerns. It gave organizations a reliable way to track what OSS they were using, understand the license obligations, and avoid using OSS with license terms they did not want.

As OSS usage spread and vulnerabilities were introduced via OSS components, the need to identify and track such vulnerabilities gained attention. In the early days, if a vulnerability was discovered in an open source component, organizations were not prepared to understand their exposure and know what software needed to be remediated. Excel was the tool of choice for tracking OSS usage, and centralized knowledgebases for OSS vulnerabilities were nascent at best. This left organizations struggling to embrace the efficiencies of open source while managing the risk to their business.

Black Duck SCA offers rigorous scanning and a comprehensive KnowledgeBase™ of OSS license and vulnerability data, enabling organizations to identify OSS as well as potential vulnerabilities and problematic licenses. Black Duck also provides a single version of truth to help organization track OSS usage and provide immediate information when new vulnerabilities are discovered. With Black Duck, organizations can address the inherent risks of OSS usage and be empowered to accelerate OSS adoption, gaining the associated benefits.

The code produced by GenAI coding assistants carries the same potential for licensing and vulnerability risks. Just as SCA solutions reduce the risk for organizations using OSS, SCA is a crucial component for scanning AI-generated code.

Rising to evolving challenges

The nature of how GenAI learns to deliver code with desired functionality requires evolving AST techniques. A good example is the extracted portions of OSS code called snippets. Already difficult to identify, snippets in code can be readily integrated into LLMs and replicated in GenAI-produced code. If an AI-generated snippet comes from an open source component with a restrictive license type, the organization is at legal and compliance risk.

Unfortunately, most SCA tools use filesystem scanning techniques that lack the sophistication to detect snippets. But Black Duck uses finer-grained scanning techniques to identify snippets and link them back to their source, detecting licensing issues. Advanced scanning techniques also identify nested dependencies, where OSS code calls other OSS code.

Black Duck’s analysis can match snippets as small as a handful of lines to the open source projects where they originated. As a result, Black Duck can provide customers with the license associated with that project and advise on associated risk and obligations. This is powered by a KnowledgeBase of more than 6 million open source projects and over 2,750 unique open source licenses. The ability to identify snippets provides a critical capability to organizations looking to use GenAI to develop code.

It should be noted that snippets can also include vulnerabilities from the original OSS component, and those vulnerabilities are much more difficult to trace through SCA workflows. Here again is where following AST best practices is critical, as code vulnerabilities should be discoverable through static application security testing (SAST), and runtime vulnerabilities should be discoverable through dynamic application security testing (DAST). The “essential three” testing programs of SCA, SAST, and DAST remain indispensable and necessary components to building trust in your software.

Summary

GenAI will undoubtedly bring change to software development as the drive to accelerate the creation of code continues. As with all “silver bullet” technologies, GenAI will have limitations and pitfalls that will need to be addressed to deliver the benefits promised. But promises of pristine, secure code that obviates the need for application security testing are at best premature and may prove to be ill conceived.

Application security testing can provide a path that enables organizations to use this technology while also ensuring that AI-generated code does not create real risks to the business. AST can be a catalyst to GenAI adoption, just as it was for OSS. Organizations must evolve their AST policies and processes to ensure that they can reap the benefits of GenAI.

Learn more about securing AI-generated code with Black Duck