extended-abstract

Unifying Regret and State-Action Space Coverage for Effective Unsupervised Environment Design

Authors:

Jayden Teoh Jing Xiang,

Wenjun Li,

Pradeep VarakanthamAuthors Info & Claims

AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems

Pages 2507 - 2509

Published: 06 May 2024 Publication History

Get Access

Abstract

Unsupervised Environment Design (UED) employs interactive training between a teacher agent and a student agent to train generally-capable student agents. Existing UED methods primarily rely on regret to progressively introduce curriculum complexity for the student but often overlook the importance of environment novelty - a critical element for enhancing an agent's exploration and generalization capabilities. There is a substantial lack of investigating the effects of environment novelty in UED. This paper addresses this gap by introducing the GMM-based Evaluation of Novelty In Environments (GENIE) framework. GENIE quantifies environment novelty within the UED paradigm by using Gaussian Mixture Models. To assess GENIE's effectiveness in quantifying novelty and driving exploration, we integrate it with ACCEL, the state-of-the-art UED algorithm. Empirical results demonstrate the superior zero-shot performance of this extended approach over existing UED algorithms, including its predecessor. By providing a means to quantify environment novelty, GENIE lays the groundwork for future UED algorithms to unify novelty-driven exploration and regret-driven exploitation in curriculum generation.

References

[1]

Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron C Courville, and Marc Bellemare. 2021. Deep reinforcement learning at the edge of the statistical precipice. Advances in neural information processing systems, Vol. 34 (2021), 29304--29320.

Google Scholar

[2]

Abdus Salam Azad, Izzeddin Gur, Jasper Emhoff, Nathaniel Alexis, Aleksandra Faust, Pieter Abbeel, and Ion Stoica. 2023. CLUTR: Curriculum Learning via Unsupervised Task Representation Learning. In International Conference on Machine Learning. PMLR, 1361--1395.

Google Scholar

[3]

Arthur P Dempster, Nan M Laird, and Donald B Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society: series B (methodological), Vol. 39, 1 (1977), 1--22.

Crossref

Google Scholar

[4]

Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, and Sergey Levine. 2020. Emergent complexity and zero-shot transfer via unsupervised environment design. Advances in neural information processing systems, Vol. 33 (2020), 13049--13061.

Google Scholar

[5]

Minqi Jiang, Michael Dennis, Jack Parker-Holder, Jakob Foerster, Edward Grefenstette, and Tim Rocktäschel. 2021a. Replay-guided adversarial environment design. Advances in Neural Information Processing Systems, Vol. 34 (2021), 1884--1897.

Google Scholar

[6]

Minqi Jiang, Edward Grefenstette, and Tim Rocktäschel. 2021b. Prioritized level replay. In International Conference on Machine Learning. PMLR, 4940--4950.

Google Scholar

[7]

Dexun Li, Wenjun Li, and Pradeep Varakantham. 2023. Diversity Induced Environment Design via Self-Play. arXiv preprint arXiv:2302.02119 (2023).

Google Scholar

[8]

Wenjun LI, Pradeep VARAKANTHAM, and Dexun LI. 2023. Generalization through diversity: Improving unsupervised environment design. (2023).

Google Scholar

[9]

Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan, Jakob Foerster, Edward Grefenstette, and Tim Rocktäschel. 2022. Evolving Curricula with Regret-Based Environment Design. arXiv preprint arXiv:2203.01302 (2022).

Google Scholar

[10]

Richard A Redner and Homer F Walker. 1984. Mixture densities, maximum likelihood and the EM algorithm. SIAM review, Vol. 26, 2 (1984), 195--239.

Google Scholar

[11]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, Vol. 9, 11 (2008).

Google Scholar

[12]

Rui Wang, Joel Lehman, Jeff Clune, and Kenneth O Stanley. 2019. Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions. arXiv preprint arXiv:1901.01753 (2019).

Google Scholar

[13]

Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeffrey Clune, and Kenneth Stanley. 2020. Enhanced poet: Open-ended reinforcement learning through unbounded invention of learning challenges and their solutions. In International Conference on Machine Learning. PMLR, 9940--9951.

Google Scholar

Index Terms

Unifying Regret and State-Action Space Coverage for Effective Unsupervised Environment Design
1. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning

Recommendations

Unsupervised Selective Labeling for More Effective Semi-supervised Learning
Computer Vision – ECCV 2022
Abstract
Given an unlabeled dataset and an annotation budget, we study how to selectively label a fixed number of instances so that semi-supervised learning (SSL) on such a partially labeled dataset is most effective. We focus on selecting the right data ...
An Expectation-Maximization Algorithm for Blind Separation of Noisy Mixtures Using Gaussian Mixture Model

In this paper, we propose a new expectation-maximization (EM) algorithm, named GMM-EM, to blind separation of noisy instantaneous mixtures, in which the non-Gaussianity of independent sources is exploited by modeling their distribution using the ...
ICA Mixture Models for Unsupervised Classification of Non-Gaussian Classes and Automatic Context Switching in Blind Signal Separation

An unsupervised classification algorithm is derived by modeling observed data as a mixture of several mutually exclusive classes that are each described by linear combinations of independent, non-Gaussian densities. The algorithm estimates the density ...

Comments

Information & Contributors

Information

Published In

AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems

May 2024

2898 pages

ISBN:9798400704864

General Chairs:
Mehdi Dastani
Utrecht University, Netherlands
,
Jaime Simão Sichman
University of São Paulo, Brazil
,
Program Chairs:
Natasha Alechina
Utrecht University, Netherlands
,
Virginia Dignum
Umeå University, Sweden

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 06 May 2024

Check for updates

Author Tags

Qualifiers

Extended-abstract

Conference

AAMAS '23

Sponsor:

SIGAI

AAMAS '23: International Conference on Autonomous Agents and Multiagent Systems

May 6 - 10, 2024

Auckland, New Zealand

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
14
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Index Terms

Recommendations

Unsupervised Selective Labeling for More Effective Semi-supervised Learning

An Expectation-Maximization Algorithm for Blind Separation of Noisy Mixtures Using Gaussian Mixture Model

ICA Mixture Models for Unsupervised Classification of Non-Gaussian Classes and Automatic Context Switching in Blind Signal Separation

Comments

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Login options

Full Access

PDF

eReader

Abstract

References

Index Terms

Recommendations

Unsupervised Selective Labeling for More Effective Semi-supervised Learning

An Expectation-Maximization Algorithm for Blind Separation of Noisy Mixtures Using Gaussian Mixture Model

ICA Mixture Models for Unsupervised Classification of Non-Gaussian Classes and Automatic Context Switching in Blind Signal Separation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations