Chris Timperley
Advanced Techniques for Search-Based Program Repair
PhD thesis, University of York, 2017

Abstract

Debugging and repairing software defects costs the global economy hundreds of billions of dollars annually, and accounts for as much as 50% of programmers’ time. To tackle the burgeoning expense of repair, researchers have proposed the use of novel techniques to automatically localise and repair such defects. Collectively, these techniques are referred to as automated program repair.

Despite promising, early results, recent studies have demonstrated that existing automated program repair techniques are considerably less effective than previously believed. Current approaches are limited either in terms of the number and kinds of bugs they can x, the size of patches they can produce, or the programs to which they can be applied. To become economically viable, automated program repair needs to overcome all of these limitations.

Search-based repair is the only approach to program repair which may be applied to any bug or program, without assuming the existence of formal specifications. Despite its generality, current search-based techniques are restricted; they are either effcient, or capable of xing multiple-line bugs—no existing technique is both. Furthermore, most techniques rely on the assumption that the material necessary to craft a repair already exists within the faulty program. By using existing code to craft repairs, the size of the search space is vastly reduced, compared to generating code from scratch. However, recent results, which show that almost all repairs generated by a number of search-based techniques can be explained as deletion, lead us to question whether this assumption is valid.

In this thesis, we identify the challenges facing search-based program repair, and demonstrate ways of tackling them. We explore if and how the knowledge of candidate patch evaluations can be used to locate the source of bugs. We use software repository mining techniques to discover the form of a better repair model capable of addressing a greater number of bugs. We conduct a theoretical and empirical analysis of existing search algorithms for repair, before demonstrating a more effective alternative, inspired by greedy algorithms. To ensure reproducibility, we propose and use a methodology for conducting high-quality automated program research. Finally, we assess our progress towards solving the challenges of search-based program repair, and reflect on the future of the field.

Full thesis