A team of Synopsys researchers demonstrated recently that code written by GitHub’s generative AI development tool Copilot (created in partnership with OpenAI and described as a descendent of GPT-3) didn’t catch an open source licensing conflict.
Ignoring licensing conflicts can be very costly. One of the most famous examples of that is Cisco, which failed to comply with requirements of GNU’s General Public License, under which its Linux routers and other open source software programs were distributed. After the Free Software Foundation brought a lawsuit, Cisco was forced to make that source code public. The amount of money it cost the company was never disclosed, but most experts say it was substantial.
This shouldn’t be a surprise. As every vendor of AI LLM tools has acknowledged, they are only as good as the dataset they have been trained on. And as has been shown with ChatGPT, they will declare falsehoods with the same level of confidence that they declare truth. In short, they need adult supervision, as any human developer would. “AI tools can assist developers when used in the correct context, such as writing a unit test or troubleshooting a given stack trace or repetitive task automation,” said Jagat Parekh, group director of software engineering with the Synopsys Software Integrity Group, and the leader of the researchers who tested Copilot. But he added that “generative AI tools are only as good as the underlying data they are trained on. It’s possible to produce biased results, to be in breach of license terms, or for a set of code recommended by the tools to have a security vulnerability.”
Parekh said another risk that isn’t getting much discussion yet is that an AI tool could recommend a code snippet to implement a certain common function, and for that snippet to become commonly used. And if, after all that, a vulnerability is discovered in that snippet, “now it is a systemic risk across many organizations.” So while vulnerabilities are found in just about every human-written codebase, with AI code that is broadly used “the scale of impact is much, much higher,” he said. That means software written by chatbots needs the same level of testing scrutiny that human-written code does, with a full suite of automated testing tools for static and dynamic analysis, software composition analysis to find open source vulnerabilities and licensing conflicts, and pen testing before production. “Attention to AppSec tools’ results would help enterprise organizations identify and mitigate compliance, security, and operational risks stemming from adoption of AI-assisted tools,” Parekh said.
Of course, there is general agreement that AI LLMs are still in an embryonic stage. They will only become more capable, likely for both better and worse. Parekh said it’s too early to know the long-term impact of the technology. “Overall, new versions of ChatGPT will require less supervision over time,” he said. “However, the question of how that translates into trust remains open, and that’s why having the right AppSec tools with greater quality of results is more important than ever.”
Or, put another way, use chatbots only for what they’re good at. And then remember that you still need to supervise them.