RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning

Gehring, Jonas; Zheng, Kunhao; Copet, Jade; Mella, Vegard; Cohen, Taco; Synnaeve, Gabriel

Computer Science > Computation and Language

arXiv:2410.02089 (cs)

[Submitted on 2 Oct 2024]

Title:RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning

Authors:Jonas Gehring, Kunhao Zheng, Jade Copet, Vegard Mella, Taco Cohen, Gabriel Synnaeve

View PDF HTML (experimental)

Abstract:Large language models (LLMs) deployed as agents solve user-specified tasks over multiple steps while keeping the required manual engagement to a minimum. Crucially, such LLMs need to ground their generations in any feedback obtained to reliably achieve desired outcomes. We propose an end-to-end reinforcement learning method for teaching models to leverage execution feedback in the realm of code synthesis, where state-of-the-art LLMs struggle to improve code iteratively compared to independent sampling. We benchmark on competitive programming tasks, where we achieve new start-of-the art results with both small (8B parameters) and large (70B) models while reducing the amount of samples required by an order of magnitude. Our analysis of inference-time behavior demonstrates that our method produces LLMs that effectively leverage automatic feedback over multiple steps.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2410.02089 [cs.CL]
	(or arXiv:2410.02089v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2410.02089

Submission history

From: Jonas Gehring [view email]
[v1] Wed, 2 Oct 2024 23:25:17 UTC (141 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.AI

< prev | next >

new | recent | 2024-10

Change to browse by:

cs
cs.CL

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators