Facebook researchers have achieved a major milestone in automated program repair with the deployment of SapFix, the first end-to-end automated repair system used in continuous integration on large-scale industrial software.
SapFix was deployed on six Android apps in the Facebook family – Facebook, Messenger, Instagram, FBLite, Workplace and Workchat. Together, these apps have tens of millions of lines of code and are used daily by hundreds of millions of users worldwide.
The system uses a combination of techniques including search-based software testing, fault localization, and fix templates based on common patterns in human-written fixes. It targets null pointer exception bugs, which account for over 50% of app crashes.
Sapienz is used for automated test case generation to detect crashes and regressions in code changes submitted for review. It uses multi-objective search to explore the space of possible test inputs.
Infer is a static analysis tool that provides additional bug detection and localization. It analyzes code changes using separation logic and bi-abduction.
These two tools allow SapFix to identify crashes and potential fault locations. The system then applies fix templates based on common human-written fixes and simple mutations like adding null checks.
The fix candidates are tested for regressions using both the Sapienz-generated tests and existing test suites. If they pass, they are suggested to developers for review.
The automated test design of Sapienz combined with the deployment on Facebook’s continuous integration infrastructure allows SapFix to provide end-to-end detection, generation, and testing of fixes.
While simple, this integration of existing techniques into a deployable pipeline represents an important milestone in bringing automated repair into real-world software development.
In the first 3 months, SapFix tackled fixes for 57 crashes. To do so, it constructed 165 patch candidates, with roughly half using fix templates and half using mutation-based repair. Of the 165 patches, 131 passed all tests and were considered fix candidates. Out of those 131 candidates, SapFix reported 55 fix suggestions covering 55 of the initial 57 crashes.
This shows that automated program repair can be successfully deployed on real-world systems at scale. However, human oversight remains important, as developers acted as the final gatekeeper for landing fixes. Their feedback also provided insights into improving the system.
The median time from fault detection to fix suggestion was just 69 minutes, showing the system’s ability to rapidly generate fixes once a fault is identified. The end-to-end automation from test generation through to fix deployment sets an important precedent.
While null pointer exceptions were the initial target, the researchers plan to expand the system’s capabilities to handle other bug types. There also remains work to be done on sociological factors in human-AI interaction for repair, automated explanations, and handling root causes.
Nevertheless, this deployment provides existential proof that automated program repair can work on large real-world systems, and paves the way for further advances in this impactful field.