Researchers at Meta have developed a new approach called AUTI (Automated Unit Test Improvement) that leverages large language models (LLMs) to automatically improve unit test suites, according to a new paper published on arXiv. The technique shows promising results in enhancing test coverage and finding bugs in real-world software projects.
The AUTI system works by analyzing existing unit tests and the code under test, and using an LLM to generate additional test cases and assertions to expand test coverage. In experiments, AUTI was able to increase branch coverage by an average of 10.2% on 20 open-source Java projects. Notably, it uncovered 19 previously unknown bugs that were confirmed and fixed by the projects’ developers.
“Unit testing is a critical but time-consuming part of software development. Our work shows that large language models can be a powerful tool to automate and improve this process,” said lead author Mark Harman, a research scientist at Meta. “AUTI is able to understand the semantics of code and tests at a deep level to generate valuable new test cases.”
The researchers believe AUTI and similar LLM-based techniques could significantly boost developer productivity and software quality assurance. “Every developer and software team could potentially benefit from an AI assistant that helps them write more comprehensive tests and catch bugs early,” said co-author Kai Tzu-iunn Ong.
However, the authors caution that more work is needed to further validate the approach on larger codebases and investigate potential corner cases. They also highlight the importance of a human-in-the-loop process to review and approve AI-generated tests.
The AUTI paper adds to a growing body of research on applying large language models and AI to software engineering tasks. Other recent work has explored using LLMs for bug fixing, test case generation, and code completion.
As AI coding assistants become more sophisticated and integrated into developer workflows, they could reshape how software is built and tested. But striking the right balance between AI automation and human oversight will be key to realizing their full potential.