Less is More: Summary of Long Instructions is Better for Program Synthesis

Kuznia, Kirby; Mishra, Swaroop; Parmar, Mihir; Baral, Chitta

Computer Science > Computation and Language

arXiv:2203.08597 (cs)

[Submitted on 16 Mar 2022 (v1), last revised 22 Oct 2022 (this version, v2)]

Title:Less is More: Summary of Long Instructions is Better for Program Synthesis

Authors:Kirby Kuznia, Swaroop Mishra, Mihir Parmar, Chitta Baral

View PDF

Abstract:Despite the success of large pre-trained language models (LMs) such as Codex, they show below-par performance on the larger and more complicated programming related questions. We show that LMs benefit from the summarized version of complicated questions. Our findings show that superfluous information often present in problem description such as human characters, background stories, and names (which are included to help humans in understanding a task) does not help models in understanding a task. To this extent, we create a meta-dataset from the frequently used APPS dataset and the newly created CodeContests dataset for the program synthesis task. Our meta-dataset consists of human and synthesized summaries of the long and complicated programming questions. Experimental results on Codex show that our proposed approach outperforms baseline by 8.13% on the APPS dataset and 11.88% on the CodeContests dataset on average in terms of strict accuracy. Our analysis shows that summaries significantly improve performance for introductory (9.86%) and interview (11.48%) programming questions. However, it shows improvement by a small margin (~ 2%) for competitive programming questions, implying scope for future research in this direction.

Comments:	EMNLP 2022
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2203.08597 [cs.CL]
	(or arXiv:2203.08597v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2203.08597

Submission history

From: Swaroop Mishra [view email]
[v1] Wed, 16 Mar 2022 13:04:12 UTC (1,531 KB)
[v2] Sat, 22 Oct 2022 07:51:53 UTC (11,697 KB)

Computer Science > Computation and Language

Title:Less is More: Summary of Long Instructions is Better for Program Synthesis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Less is More: Summary of Long Instructions is Better for Program Synthesis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators