Conference paper

Heterogeneous Prompting and Execution Feedback for SWE Issue Test Generation and Selection

Abstract

A software engineering issue (SWE issue) is easier to resolve when accompanied by a reproduction test. Unfortunately, most issues do not come with functioning reproduction tests, so this paper explores how to generate them automatically. The main difficulty with that is that the code to be tested is either missing or wrong, as evidenced by the existence of the issue in the first place. This has held back test generation for this scenario: without the correct code to execute, it is difficult to leverage execution feedback to generate good tests. This paper introduces novel ideas to get around this problem for leveraging execution feedback, implemented in a new reproduction test generator called e-Otter++. Experiments show that e-Otter++ represents a leap ahead in the state-of-the-art for this problem, generating tests with an average fail-to-pass rate of 63% on the TDD-Bench Verified benchmark.