skip to main content
10.5555/3635637.3662910acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

Potential-Based Reward Shaping for Intrinsic Motivation

Published: 06 May 2024 Publication History

Abstract

Recently there has been a proliferation of intrinsic motivation (IM) reward-shaping methods to learn in complex and sparse-reward environments. These methods can often inadvertently change the set of optimal policies in an environment, leading to suboptimal behavior. Previous work on mitigating the risks of reward shaping, particularly through potential-based reward shaping (PBRS), has not been applicable to many IM methods, as they are often complex, trainable functions themselves, and therefore dependent on a wider set of variables than the traditional reward functions that PBRS was developed for. We present an extension to PBRS that we prove preserves the set of optimal policies under a more general set of functions than has been previously proven. We also present Potential-Based Intrinsic Motivation (PBIM), a method for converting IM rewards into a potential-based form that is useable without altering the set of optimal policies. Testing in the MiniGrid DoorKey and Cliff Walking environments, we demonstrate that PBIM successfully prevents the agent from converging to a suboptimal policy and can speed up training.

References

[1]
Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. 2016. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565.
[2]
Paniz Behboudian, Yash Satsangi, Matthew E Taylor, Anna Harutyunyan, and Michael Bowling. 2022. Policy invariant explicit shaping: an efficient alternative to reward shaping. Neural Computing and Applications, Vol. 34 (2022), 1--14.
[3]
Marc Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, and Remi Munos. 2016. Unifying count-based exploration and intrinsic motivation. Advances in neural information processing systems, Vol. 29 (2016), 1471--1479.
[4]
Yuri Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, and Alexei A Efros. 2018a. Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355.
[5]
Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. 2018b. Exploration by random network distillation. arXiv preprint arXiv:1810.12894.
[6]
Eric Chen, Zhang-Wei Hong, Joni Pajarinen, and Pulkit Agrawal. 2022. Redeeming Intrinsic Rewards via Constrained Optimization. arXiv preprint arXiv:2211.07627.
[7]
Maxime Chevalier-Boisvert, Lucas Willems, and Suman Pal. 2018. Minimalistic Gridworld Environment for Gymnasium. https://github.com/Farama-Foundation/Minigrid.
[8]
Sam Michael Devlin and Daniel Kudenko. 2012. Dynamic potential-based reward shaping. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems. IFAAMAS, Valencia, Spain, 433--440.
[9]
Adam Eck, Leen-Kiat Soh, Sam Devlin, and Daniel Kudenko. 2016. Potential-based reward shaping for finite horizon online pomdp planning. Autonomous Agents and Multi-Agent Systems, Vol. 30, 3 (2016), 403--445.
[10]
Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. 2018. Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070.
[11]
Grant C. Forbes and David L. Roberts. 2024. Potential-Based Reward Shaping For Intrinsic Motivation (Student Abstract). In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI.
[12]
Prasoon Goyal, Scott Niekum, and Raymond J Mooney. 2019. Using natural language for reward shaping in reinforcement learning. arXiv preprint arXiv:1903.02020.
[13]
Marek Grzes. 2017. Reward shaping in episodic reinforcement learning. In Sixteenth International Conference on Autonomous Agents and Multiagent Systems. ACM, São Paulo, Brazil, 565--573.
[14]
Anna Harutyunyan, Sam Devlin, Peter Vrancx, and Ann Nowé. 2015. Expressing arbitrary reward functions as potential-based advice. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 29(1). AAAI, Austin, Texas, USA, 2652--2658.
[15]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.
[16]
Adam Daniel Laud. 2004. Theory and application of reward shaping in reinforcement learning. University of Illinois at Urbana-Champaign, Illinois, USA.
[17]
Shakir Mohamed and Danilo Jimenez Rezende. 2015. Variational information maximisation for intrinsically motivated reinforcement learning. Advances in neural information processing systems, Vol. 28 (2015), 9.
[18]
Andrew Y. Ng. 2003. Shaping and policy search in reinforcement learning. Ph.D. Dissertation. University of California, Berkeley.
[19]
Andrew Y Ng, Daishi Harada, and Stuart Russell. 1999. Policy invariance under reward transformations: Theory and application to reward shaping. In Icml, Vol. 99. Morgan Kaufmann, San Francisco, CA, USA, 278--287.
[20]
Georg Ostrovski, Marc G Bellemare, Aäron Oord, and Rémi Munos. 2017. Count-based exploration with neural density models. In International conference on machine learning. PMLR, MLResearchPress, Sydney, Australia, 2721--2730.
[21]
Deepak Pathak, Pulkit Agrawal, Alexei A Efros, and Trevor Darrell. 2017. Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning. PMLR, MLResearchPress, Sydney, Australia, 2778--2787.
[22]
Roberta Raileanu and Tim Rocktäschel. 2020. Ride: Rewarding impact-driven exploration for procedurally-generated environments. arXiv preprint arXiv:2002.12292.
[23]
Jette Randløv and Preben Alstrøm. 1998. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping. In ICML, Vol. 98. Citeseer, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 463--471.
[24]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
[25]
Alexander L Strehl and Michael L Littman. 2008. An analysis of model-based interval estimation for Markov decision processes. J. Comput. System Sci., Vol. 74, 8 (2008), 1309--1331.
[26]
Richard S Sutton. 1990. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Machine learning proceedings 1990. Elsevier, Texas, USA, 216--224.
[27]
Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press, CA, USA.
[28]
Eric Wiewiora. 2003. Potential-based shaping and Q-value initialization are equivalent. Journal of Artificial Intelligence Research, Vol. 19 (2003), 205--208.
[29]
Eric Wiewiora, Garrison W Cottrell, and Charles Elkan. 2003. Principled methods for advising reinforcement learning agents. In Proceedings of the 20th International Conference on Machine Learning (ICML-03). AAAI Press, Washington DC, USA, 792--799.
[30]
Lucas Willems. 2023. rl-starter-files. https://github.com/lcswillems/rl-starter-files

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems
May 2024
2898 pages
ISBN:9798400704864

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 06 May 2024

Check for updates

Author Tags

  1. game-playing agents
  2. intrinsic motivation
  3. potential-based reward shaping
  4. reinforcement learning
  5. reward shaping

Qualifiers

  • Research-article

Conference

AAMAS '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 22
    Total Downloads
  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)3
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media