Automated Program Repair Using LLMs
Publication Date
Spring 4-22-2026
College
College of Sciences & Mathematics
Department
Math and Computer Science, Department of
Faculty Mentor
Esteban Rodriguez
Presentation Type
Talk/Oral
Summary
Large Language Models (LLMs) have recently become popularized as tools for automated program repair (APR), presenting a promising approach to fixing software defects with minimal human intervention. However, while LLMs can generate these patches to fix defects, their ability to reliably evaluate the correctness and quality of those patches remains an open challenge. This study investigates the effectiveness of LLMs in both generating and evaluating software patches. We replicate and extend the RepairAgent framework, using ChatGPT and other LLMs to produce patches for real-world bugs. RepairAgent is the tool that we use to generate patches on software defects for various open source systems. Previous work shows LLMs are a promising judge when evaluating source code and summaries, but not reliable enough to replace humans for software engineering tasks. We constructed an evaluation layer that judges the quality of patches produced through APR systems and created from a dataset of produced patches, evaluated using various LLMs (i.e., DeepSeek, GPT-4omini). The LLM utilizes ground-truth tests to determine the reliability of the patches.
Recommended Citation
Shahzad, Duaa and Sinde, Rawshan, "Automated Program Repair Using LLMs" (2026). SPARK Symposium Presentations. 929.
https://repository.belmont.edu/spark_presentations/929
