skip to main content
10.5555/3635637.3663209acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
extended-abstract

Unifying Regret and State-Action Space Coverage for Effective Unsupervised Environment Design

Published: 06 May 2024 Publication History

Abstract

Unsupervised Environment Design (UED) employs interactive training between a teacher agent and a student agent to train generally-capable student agents. Existing UED methods primarily rely on regret to progressively introduce curriculum complexity for the student but often overlook the importance of environment novelty - a critical element for enhancing an agent's exploration and generalization capabilities. There is a substantial lack of investigating the effects of environment novelty in UED. This paper addresses this gap by introducing the GMM-based Evaluation of Novelty In Environments (GENIE) framework. GENIE quantifies environment novelty within the UED paradigm by using Gaussian Mixture Models. To assess GENIE's effectiveness in quantifying novelty and driving exploration, we integrate it with ACCEL, the state-of-the-art UED algorithm. Empirical results demonstrate the superior zero-shot performance of this extended approach over existing UED algorithms, including its predecessor. By providing a means to quantify environment novelty, GENIE lays the groundwork for future UED algorithms to unify novelty-driven exploration and regret-driven exploitation in curriculum generation.

References

[1]
Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron C Courville, and Marc Bellemare. 2021. Deep reinforcement learning at the edge of the statistical precipice. Advances in neural information processing systems, Vol. 34 (2021), 29304--29320.
[2]
Abdus Salam Azad, Izzeddin Gur, Jasper Emhoff, Nathaniel Alexis, Aleksandra Faust, Pieter Abbeel, and Ion Stoica. 2023. CLUTR: Curriculum Learning via Unsupervised Task Representation Learning. In International Conference on Machine Learning. PMLR, 1361--1395.
[3]
Arthur P Dempster, Nan M Laird, and Donald B Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society: series B (methodological), Vol. 39, 1 (1977), 1--22.
[4]
Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, and Sergey Levine. 2020. Emergent complexity and zero-shot transfer via unsupervised environment design. Advances in neural information processing systems, Vol. 33 (2020), 13049--13061.
[5]
Minqi Jiang, Michael Dennis, Jack Parker-Holder, Jakob Foerster, Edward Grefenstette, and Tim Rocktäschel. 2021a. Replay-guided adversarial environment design. Advances in Neural Information Processing Systems, Vol. 34 (2021), 1884--1897.
[6]
Minqi Jiang, Edward Grefenstette, and Tim Rocktäschel. 2021b. Prioritized level replay. In International Conference on Machine Learning. PMLR, 4940--4950.
[7]
Dexun Li, Wenjun Li, and Pradeep Varakantham. 2023. Diversity Induced Environment Design via Self-Play. arXiv preprint arXiv:2302.02119 (2023).
[8]
Wenjun LI, Pradeep VARAKANTHAM, and Dexun LI. 2023. Generalization through diversity: Improving unsupervised environment design. (2023).
[9]
Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan, Jakob Foerster, Edward Grefenstette, and Tim Rocktäschel. 2022. Evolving Curricula with Regret-Based Environment Design. arXiv preprint arXiv:2203.01302 (2022).
[10]
Richard A Redner and Homer F Walker. 1984. Mixture densities, maximum likelihood and the EM algorithm. SIAM review, Vol. 26, 2 (1984), 195--239.
[11]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, Vol. 9, 11 (2008).
[12]
Rui Wang, Joel Lehman, Jeff Clune, and Kenneth O Stanley. 2019. Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions. arXiv preprint arXiv:1901.01753 (2019).
[13]
Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeffrey Clune, and Kenneth Stanley. 2020. Enhanced poet: Open-ended reinforcement learning through unbounded invention of learning challenges and their solutions. In International Conference on Machine Learning. PMLR, 9940--9951.

Index Terms

  1. Unifying Regret and State-Action Space Coverage for Effective Unsupervised Environment Design

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems
    May 2024
    2898 pages
    ISBN:9798400704864

    Sponsors

    Publisher

    International Foundation for Autonomous Agents and Multiagent Systems

    Richland, SC

    Publication History

    Published: 06 May 2024

    Check for updates

    Author Tags

    1. gaussian mixture model
    2. novelty quantification
    3. unsupervised environment design

    Qualifiers

    • Extended-abstract

    Conference

    AAMAS '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 14
      Total Downloads
    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 22 Sep 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media