Automated Program Repair Using LLMs

Publication Date

Spring 4-22-2026

College

College of Sciences & Mathematics

Department

Math and Computer Science, Department of

Faculty Mentor

Esteban Rodriguez

Presentation Type

Talk/Oral

Summary

Large Language Models (LLMs) have recently become popularized as tools for automated program repair (APR), presenting a promising approach to fixing software defects with minimal human intervention. However, while LLMs can generate these patches to fix defects, their ability to reliably evaluate the correctness and quality of those patches remains an open challenge. This study investigates the effectiveness of LLMs in both generating and evaluating software patches. We replicate and extend the RepairAgent framework, using ChatGPT and other LLMs to produce patches for real-world bugs. RepairAgent is the tool that we use to generate patches on software defects for various open source systems. Previous work shows LLMs are a promising judge when evaluating source code and summaries, but not reliable enough to replace humans for software engineering tasks. We constructed an evaluation layer that judges the quality of patches produced through APR systems and created from a dataset of produced patches, evaluated using various LLMs (i.e., DeepSeek, GPT-4omini). The LLM utilizes ground-truth tests to determine the reliability of the patches.

This document is currently not available here.

Share

COinS