skip to main content
10.1145/3321707.3321831acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Scenario co-evolution for reinforcement learning on a grid world smart factory domain

Published: 13 July 2019 Publication History

Abstract

Adversarial learning has been established as a successful paradigm in reinforcement learning. We propose a hybrid adversarial learner where a reinforcement learning agent tries to solve a problem while an evolutionary algorithm tries to find problem instances that are hard to solve for the current expertise of the agent, causing the intelligent agent to co-evolve with a set of test instances or scenarios. We apply this setup, called scenario co-evolution, to a simulated smart factory problem that combines task scheduling with navigation of a grid world. We show that the so trained agent outperforms conventional reinforcement learning. We also show that the scenarios evolved this way can provide useful test cases for the evaluation of any (however trained) agent.

References

[1]
Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. 2016. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565 (2016).
[2]
Saswat Anand, Edmund K Burke, Tsong Yueh Chen, John Clark, Myra B Cohen, Wolfgang Grieskamp, Mark Harman, Mary Jean Harrold, Phil Mcminn, Antonia Bertolino, et al. 2013. An orchestrated survey of methodologies for automated software test case generation. Journal of Systems and Software 86, 8 (2013), 1978--2001.
[3]
Thomas Anthony, Zheng Tian, and David Barber. 2017. Thinking fast and slow with deep learning and tree search. In Advances in Neural Information Processing Systems. 5360--5370.
[4]
Andrea Arcuri and Xin Yao. 2008. A novel co-evolutionary approach to automatic software bug fixing. In Evolutionary Computation, 2008. CEC 2008.(IEEE World Congress on Computational Intelligence). IEEE Congress on. IEEE, 162--168.
[5]
Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, and Igor Mordatch. 2018. Emergent Complexity via Multi-Agent Competition. In ICLR.
[6]
Lenz Belzner, Michael Till Beck, Thomas Gabor, Harald Roelle, and Horst Sauer. 2016. Software engineering for distributed autonomous real-time systems. In Proceedings of the 2nd International Workshop on Software Engineering for Smart Cyber-Physical Systems. ACM, 54--57.
[7]
Lenz Belzner and Thomas Gabor. 2017. Bayesian verification under model uncertainty. In Software Engineering for Smart Cyber-Physical Systems (SEsCPS), 2017 IEEE/ACM 3rd International Workshop on. IEEE, 10--13.
[8]
Camillo Bérénos, K Mathias Wegner, and Paul Schmid-Hempel. 2010. Antagonistic coevolution with parasites maintains host genetic diversity: an experimental test. Proceedings of the Royal Society of London B: Biological Sciences (2010).
[9]
Hans-Georg Beyer. 2000. Evolutionary algorithms in noisy environments: Theoretical issues and guidelines for practice. Computer methods in applied mechanics and engineering 186, 2--4 (2000), 239--267.
[10]
Joshua Brown, Zhi Quan Zhou, and Yang-Wai Chow. 2018. Metamorphic Testing of Navigation Software: A Pilot Study with Google Maps. In Proceedings of the 51st Hawaii International Conference on System Sciences.
[11]
Tomas Bures, Danny Weyns, Christian Berger, Stefan Biffl, Marian Daun, Thomas Gabor, David Garlan, Ilias Gerostathopoulos, Christine Julien, Filip Krikava, et al. 2015. Software Engineering for Smart Cyber-Physical Systems-Towards a Research Agenda: Report on the First International Workshop on Software Engineering for Smart CPS. ACM SIGSOFT Software Engineering Notes 40, 6 (2015), 28--32.
[12]
Fulvio Corno, Ernesto Sánchez, Matteo Sonza Reorda, and Giovanni Squillero. 2004. Automatic test program generation: a case study. IEEE Design & Test of Computers 21, 2 (2004), 102--109.
[13]
Rogério De Lemos, Holger Giese, Hausi A Müller, Mary Shaw, Jesper Andersson, Marin Litoiu, Bradley Schmerl, Gabriel Tamura, Norha M Villegas, Thomas Vogel, et al. 2013. Software engineering for self-adaptive systems: A second research roadmap. In Software Engineering for Self-Adaptive Systems II. Springer, 1--32.
[14]
Daniel Dewey. 2014. Reinforcement learning and the reward engineering principle. In 2014 AAAI Spring Symposium Series.
[15]
Jon Edvardsson. 1999. A survey on automatic test data generation. In Proceedings of the 2nd Conference on Computer Science and Engineering. 21--28.
[16]
Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, and Pieter Abbeel. 2017. Reverse curriculum generation for reinforcement learning. arXiv preprint arXiv:1707.05300 (2017).
[17]
Gordon Fraser and Andrea Arcuri. 2011. Evolutionary generation of whole test suites. In 2011 11th International Conference on Quality Software. IEEE, 31--40.
[18]
Gordon Fraser and Andrea Arcuri. 2011. EvoSuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. ACM, 416--419.
[19]
Gordon Fraser and Andrea Arcuri. 2014. A large-scale evaluation of automated unit test generation using EvoSuite. ACM Transactions on Software Engineering and Methodology (TOSEM) 24, 2 (2014), 8.
[20]
Thomas Gabor, Lenz Belzner, and Claudia Linnhoff-Popien. 2018. Inheritance-based diversity measures for explicit convergence control in evolutionary algorithms. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, 841--848.
[21]
Thomas Gabor, Lenz Belzner, Thomy Phan, and Kyrill Schmid. 2018. Preparing for the Unexpected: Diversity Improves Planning Resilience in Evolutionary Algorithms. In 2018 IEEE International Conference on Autonomic Computing (ICAC). IEEE, 131--140.
[22]
Thomas Gabor, Marie Kiermeier, Andreas Sedlmeier, Bernhard Kempter, Cornel Klein, Horst Sauer, Reiner Schmid, and Jan Wieghardt. 2018. Adapting quality assurance to adaptive systems: the scenario coevolution paradigm. In International Symposium on Leveraging Applications of Formal Methods. Springer, 137--154.
[23]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.
[24]
Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. 2018. Deep reinforcement learning that matters. In Thirty-Second AAAI Conference on Artificial Intelligence.
[25]
Matteo Hessel, Joseph Modayil, Hado Van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver. 2018. Rainbow: Combining improvements in deep reinforcement learning. In Thirty-Second AAAI Conference on Artificial Intelligence.
[26]
Matthias Hölzl, Nora Koch, Mariachiara Puviani, Martin Wirsing, and Franco Zambonelli. 2015. The ensemble development life cycle and best practices for collective autonomic systems. In Software Engineering for Collective Autonomic Systems. Springer, 325--354.
[27]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[28]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
[29]
Kiran Lakhotia, Mark Harman, and Phil McMinn. 2007. A multi-objective approach to search-based test data generation. In Proceedings of the 9th annual conference on Genetic and evolutionary computation. ACM, 1098--1105.
[30]
Phil McMinn. 2004. Search-based software test data generation: a survey. Software testing, Verification and reliability 14, 2 (2004), 105--156.
[31]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.
[32]
Nick Moran and Jordan Pollack. 2018. Coevolutionary Neural Population Models. arXiv preprint arXiv:1804.04187 (2018).
[33]
Jason Morrison and Franz Oppacher. 1999. A general model of co-evolution for genetic algorithms. In Artificial Neural Nets and Genetic Algorithms. Springer, 262--268.
[34]
Glenford J Myers, Corey Sandler, and Tom Badgett. 2011. The art of software testing. John Wiley & Sons.
[35]
Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10). 807--814.
[36]
Geoff S Nitschke, AE Eiben, and Martijn C Schut. 2012. Evolving team behaviors with specialization. Genetic Programming and Evolvable Machines 13, 4 (2012), 493--536.
[37]
Randal S Olson, David B Knoester, and Christoph Adami. 2016. Evolution of swarming behavior is shaped by how predators attack. Artificial life 22, 3 (2016), 299--318.
[38]
Anay Pattanaik, Zhenyi Tang, Shuijing Liu, Gautham Bommannan, and Girish Chowdhary. 2018. Robust deep reinforcement learning with adversarial attacks. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 2040--2042.
[39]
Lerrel Pinto, James Davidson, Rahul Sukthankar, and Abhinav Gupta. 2017. Robust adversarial reinforcement learning. arXiv preprint arXiv:1703.02702 (2017).
[40]
Jordan B Pollack and Alan D Blair. 1998. Co-evolution in the successful learning of backgammon strategy. Machine learning 32, 3 (1998), 225--240.
[41]
Martin L Puterman. 2014. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.
[42]
André Reichstaller, Thomas Gabor, and Alexander Knapp. 2018. Mutation-based test suite evolution for self-organizing systems. In International Symposium on Leveraging Applications of Formal Methods. Springer, 118--136.
[43]
Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).
[44]
Arthur L Samuel. 1959. Some studies in machine learning using the game of checkers. IBM Journal of research and development 3, 3 (1959), 210--229.
[45]
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484.
[46]
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of Go without human knowledge. Nature 550, 7676 (2017), 354.
[47]
Giovanni Squillero and Alberto Tonda. 2016. Divergence of character and premature convergence: A survey of methodologies for promoting diversity in evolutionary optimization. Information Sciences 329 (2016), 782--799.
[48]
Richard S Sutton and Andrew G Barto. 1998. Introduction to reinforcement learning. Vol. 135. MIT press Cambridge.
[49]
Gerald Tesauro. 1995. Temporal difference learning and TD-Gammon. Commun. ACM 38, 3 (1995), 58--69.
[50]
Rui Wang, Joel Lehman, Jeff Clune, and Kenneth O Stanley. 2019. Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions. arXiv preprint arXiv:1901.01753 (2019).
[51]
Christopher John Cornish Hellaby Watkins. 1989. Learning from delayed rewards. Ph.D. Dissertation. King's College, Cambridge.
[52]
Joachim Wegener, Kerstin Buhr, and Hartmut Pohlheim. 2002. Automatic test data generation for structural testing of embedded software systems by evolutionary testing. In Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation. Morgan Kaufmann Publishers Inc., 1233--1240.
[53]
Daan Wierstra, Tom Schaul, Jan Peters, and Juergen Schmidhuber. 2008. Natural evolution strategies. In IEEE World Congress on Computational Intelligence. IEEE, 3381--3387.
[54]
Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3--4 (1992), 229--256.
[55]
Chern Han Yong and Risto Miikkulainen. 2001. Cooperative coevolution of multi-agent systems. University of Texas at Austin, Austin, TX (2001).

Cited By

View all
  • (2024)Confidence-Based Curriculum Learning for Multi-Agent Path FindingProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663016(1558-1566)Online publication date: 6-May-2024
  • (2021)Co-Evolution of Predator-Prey Ecosystems by Reinforcement Learning AgentsEntropy10.3390/e2304046123:4(461)Online publication date: 13-Apr-2021
  • (2020)Learning and Testing Resilience in Cooperative Multi-Agent SystemsProceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3398761.3398884(1055-1063)Online publication date: 5-May-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference
July 2019
1545 pages
ISBN:9781450361118
DOI:10.1145/3321707
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adversarial learning
  2. automatic test generation
  3. coevolution
  4. evolutionary algorithms
  5. reinforcement learning

Qualifiers

  • Research-article

Conference

GECCO '19
Sponsor:
GECCO '19: Genetic and Evolutionary Computation Conference
July 13 - 17, 2019
Prague, Czech Republic

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)1
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Confidence-Based Curriculum Learning for Multi-Agent Path FindingProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663016(1558-1566)Online publication date: 6-May-2024
  • (2021)Co-Evolution of Predator-Prey Ecosystems by Reinforcement Learning AgentsEntropy10.3390/e2304046123:4(461)Online publication date: 13-Apr-2021
  • (2020)Learning and Testing Resilience in Cooperative Multi-Agent SystemsProceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3398761.3398884(1055-1063)Online publication date: 5-May-2020
  • (2020)The scenario coevolution paradigm: adaptive quality assurance for adaptive systemsInternational Journal on Software Tools for Technology Transfer10.1007/s10009-020-00560-5Online publication date: 6-Mar-2020

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media