Towards Mixed Optimization for Reinforcement Learning with Program Synthesis

Bhupatiraju, Surya; Agrawal, Kumar Krishna; Singh, Rishabh

Computer Science > Machine Learning

arXiv:1807.00403 (cs)

[Submitted on 1 Jul 2018 (v1), last revised 3 Jul 2018 (this version, v2)]

Title:Towards Mixed Optimization for Reinforcement Learning with Program Synthesis

Authors:Surya Bhupatiraju, Kumar Krishna Agrawal, Rishabh Singh

View PDF

Abstract:Deep reinforcement learning has led to several recent breakthroughs, though the learned policies are often based on black-box neural networks. This makes them difficult to interpret and to impose desired specification constraints during learning. We present an iterative framework, MORL, for improving the learned policies using program synthesis. Concretely, we propose to use synthesis techniques to obtain a symbolic representation of the learned policy, which can then be debugged manually or automatically using program repair. After the repair step, we use behavior cloning to obtain the policy corresponding to the repaired program, which is then further improved using gradient descent. This process continues until the learned policy satisfies desired constraints. We instantiate MORL for the simple CartPole problem and show that the programmatic representation allows for high-level modifications that in turn lead to improved learning of the policies.

Comments:	Updated publication details, format. Accepted at NAMPI workshop, ICML '18
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:1807.00403 [cs.LG]
	(or arXiv:1807.00403v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1807.00403

Submission history

From: Kumar Krishna Agrawal [view email]
[v1] Sun, 1 Jul 2018 21:52:07 UTC (424 KB)
[v2] Tue, 3 Jul 2018 22:08:06 UTC (424 KB)

Computer Science > Machine Learning

Title:Towards Mixed Optimization for Reinforcement Learning with Program Synthesis

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Towards Mixed Optimization for Reinforcement Learning with Program Synthesis

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators