A new study evaluates the ability of large language models (LLMs) like ChatGPT to detect and fix security vulnerabilities in code, comparing their performance to traditional static code analyzers.
The paper analyzed 129 code samples (files) across 8 programming languages from public repositories like NASA and the Department of Defense. The samples contained known vulnerabilities that were evaluated by both LLMs and standard tools like Snyk and Fortify.
The results show LLMs identified approximately 4 times more vulnerabilities than their counterparts. Specifically, Snyk found 98 issues while the latest LLM, GPT-4, detected 393 vulnerabilities. The analysis also revealed GPT-4 provided concrete fixes for each identified problem, demonstrating a low false positive rate.
Across vulnerability categories like path traversal and file inclusion, the LLM identified 3-4 times more flaws. When GPT-4’s suggested fixes were implemented, it led to a 90% reduction in issues, requiring only an 11% increase in code lines.
The research highlights LLMs’ ability to generalize based on broad training data, allowing them to pinpoint nuanced or evolving vulnerabilities traditional rules-based systems may miss. Their natural language foundations also enable them to provide contextual, human-readable feedback.
The authors suggest combining LLMs and static analyzers could significantly strengthen software security and reduce exploits in the wild. With cyber threats continually growing, AI-augmented solutions may prove essential in building robust and reliable systems.
The study provides evidence that as models scale to trillions of parameters, their code understanding substantially improves – a promising result as LLMs continue rapidly advancing. Future work should explore system-level vulnerabilities and integrate multiple scanning tools to fully assess LLMs’ potential.
With software permeating nearly all aspects of society, securing complex, evolving codebases is a pivotal challenge. This research indicates AI could automate identifying and patching vulnerabilities, bringing us one step closer to safeguarding the digital infrastructure underlying our world.