research-article

Analysing the Sample Complexity of Opponent Shaping

Authors:

Jakob FoersterAuthors Info & Claims

AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems

Pages 623 - 631

Published: 06 May 2024 Publication History

Abstract

Learning in general-sum games often yields collectively sub-optimal results. Addressing this, opponent shaping (OS) methods actively guide the learning processes of other agents, empirically leading to improved individual and group performances. Early OS methods use higher-order derivatives to shape the learning of co-players, making them unsuitable to shape multiple learning steps. Follow-up work Model-free Opponent Shaping (M-FOS) addresses the short-comings of earlier OS methods by reframing the OS problem into a meta-game. In contrast to early OS methods, there is little theoretical understanding of the M-FOS framework. Providing theoretical guarantees for M-FOS is hard because A) there is little literature on theoretical sample complexity bounds for meta-reinforcement learning; B) M-FOS operates in continuous state and action spaces hence theoretical analysis is challenging. In this work, we present R-FOS, a tabular version of M-FOS that is more suitable for theoretical analysis. R-FOS discretises the continuous meta-game MDP into a tabular MDP. Within this discretised MDP, we adapt the Rmax algorithm, most prominently used to derive PAC-bounds for MDPs, as the meta-learner in the R-FOS algorithm. We derive a sample complexity bound that is exponential in the cardinality of the inner state and action space and the number of agents. Our bound guarantees that, with high probability, the optimal policy learned by an R-FOS agent can get close to the optimal policy, apart from a constant factor. Finally, we investigate how R-FOS's sample complexity scales in the size of state-action space. Our theoretical results on scaling are supported empirically in the Matching Pennies environment.

References

[1]

David Balduzzi, Sebastien Racaniere, James Martens, Jakob Foerster, Karl Tuyls, and Thore Graepel. 2018. The mechanics of n-player differentiable games. In International Conference on Machine Learning. PMLR, 354--363.

[2]

Craig Boutilier, Thomas Dean, and Steve Hanks. 1999. Decision- Theoretic Planning: Structural Assumptions and Computational Leverage. J. Artif. Int. Res. 11, 1 (jul 1999), 1--94.

[3]

Ronen Brafman and Moshe Tennenholtz. 2001. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning. The Journal of Machine Learning Research 3, 953--958. https://doi.org/10.1162/153244303765208377

Digital Library

[4]

CHEE-S Chow and John N Tsitsiklis. 1991. An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE transactions on automatic control 36, 8 (1991), 898--914.

[5]

Murat A. Erdogdu. 2022. Covering with epsilon-nets. https://erdogdu.github.io/csc2532/lectures/lecture05.pdf

[6]

Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch. 2018. Learning with Opponent-Learning Awareness. arXiv:1709.04326 [cs.AI]

[7]

Kitty Fung, Qizhen Zhang, Chris Lu, Jia Wan, Timon Willi, and Jakob Foerster. 2024. Analysing the Sample Complexity of Opponent Shaping. arXiv preprint arXiv:2402.05782 (2024).

[8]

D Haussler and EWelzl. 1986. Epsilon-Nets and Simplex Range Queries. In Proceedings of the Second Annual Symposium on Computational Geometry (Yorktown Heights, New York, USA) (SCG '86). Association for Computing Machinery, New York, NY, USA, 61--71. https://doi.org/10.1145/10515.10522

Digital Library

[9]

Sham Kakade. 2003. On the sample complexity of Reinforcement Learning. Ph.D. Dissertation. University of London.

[10]

Akbir Khan, Newton Kwan, Timon Willi, Chris Lu, Andrea Tacchetti, and Jakob Nicolaus Foerster. [n.d.]. Context and History Aware Other- Shaping. ([n. d.]).

[11]

Dong-Ki Kim, Miao Liu, Matthew D Riemer, Chuangchuang Sun, Marwa Abdulhai, Golnaz Habibi, Sebastian Lopez-Cot, Gerald Tesauro, and JONATHAN P HOW. 2021. A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning. https://openreview.net/forum?id=zdrls6LIX4W

[12]

Alistair Letcher. 2020. On the impossibility of global convergence in multi-loss optimization. arXiv preprint arXiv:2005.12649 (2020).

[13]

Alistair Letcher, Jakob Foerster, David Balduzzi, Tim Rocktäschel, and Shimon Whiteson. 2018. Stable opponent shaping in differentiable games. arXiv preprint arXiv:1811.08469 (2018).

[14]

Yao Liu and Emma Brunskill. 2018. When Simple Exploration is Sample Efficient: Identifying Sufficient Conditions for Random Exploration to Yield PAC RL Algorithms.

[15]

Chris Lu, Timon Willi, Christian A. Schroeder de Witt, and Jakob N. Foerster. 2022. Model-Free Opponent Shaping. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA (Proceedings of Machine Learning Research, Vol. 162). PMLR, 14398--14411.

[16]

Chris Lu, Timon Willi, Alistair Letcher, and Jakob Foerster. 2022. Adversarial Cheap Talk. arXiv preprint arXiv:2211.11030 (2022).

[17]

Dhruv Malik, Aldo Pacchiano, Vishwak Srinivasan, and Yuanzhi Li. 2021. Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity. In International Conference on Machine Learning.

[18]

Florian Schäfer and Anima Anandkumar. 2019. Competitive gradient descent. Advances in Neural Information Processing Systems 32 (2019).

[19]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).

[20]

Satinder P. Singh, Tommi Jaakkola, and Michael I. Jordan. 1994. Reinforcement Learning with Soft State Aggregation. In Proceedings of the 7th International Conference on Neural Information Processing Systems (Denver, Colorado) (NIPS'94). MIT Press, Cambridge, MA, USA, 361--368.

[21]

Alexander L. Strehl, Lihong Li, and Michael L. Littman. 2009. Reinforcement Learning in Finite MDPs: PAC Analysis. J. Mach. Learn. Res. 10 (dec 2009), 2413--2444.

[22]

Timon Willi, Alistair Hp Letcher, Johannes Treutlein, and Jakob Foerster. 2022. COLA: consistent learning with opponent-learning awareness. In International Conference on Machine Learning. PMLR, 23804--23831.

[23]

Qizhen Zhang, Chris Lu, Animesh Garg, and Jakob Foerster. 2022. Centralized Model and Exploration Policy for Multi-Agent RL. In International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS). https://arxiv.org/abs/2107.06434v2

[24]

Stephen Zhao, Chris Lu, Roger B Grosse, and Jakob Foerster. 2022. Proximal Learning With Opponent-Learning Awareness. Advances in Neural Information Processing Systems 35 (2022), 26324--26336.

Index Terms

Analysing the Sample Complexity of Opponent Shaping
1. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning
        Multi-agent reinforcement learning

Recommendations

Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model

We consider the problems of learning the optimal action-value function and the optimal policy in discounted-reward Markov decision processes (MDPs). We prove new PAC bounds on the sample-complexity of two well-known model-based reinforcement learning (...
Reducing Sample Complexity in Reinforcement Learning by Transferring Transition and Reward Probabilities
ICAART 2014: Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1

Most existing reinforcement learning algorithms require many trials until they obtain optimal policies. In this study, we apply transfer learning to reinforcement learning to realize greater efficiency. We propose a new algorithm called TR-MAX, based on ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems

May 2024

2898 pages

ISBN:9798400704864

General Chairs:
Mehdi Dastani
Utrecht University, Netherlands
,
Jaime Simão Sichman
University of São Paulo, Brazil
,
Program Chairs:
Natasha Alechina
Utrecht University, Netherlands
,
Virginia Dignum
Umeå University, Sweden

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 06 May 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Cohere
D.H. Chen Foundation
Armasuisse

Conference

AAMAS '23

Sponsor:

SIGAI

AAMAS '23: International Conference on Autonomous Agents and Multiagent Systems

May 6 - 10, 2024

Auckland, New Zealand

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
8
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents