research-article

Scenario co-evolution for reinforcement learning on a grid world smart factory domain

Authors:

Andreas Sedlmeier,

Marie Kiermeier,

Marcel Henrich,

Monika Pichlmair,

Bernhard Kempter,

Reiner SchmidSiemens AG,

Jan WieghardtAuthors Info & Claims

GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference

Pages 898 - 906

https://doi.org/10.1145/3321707.3321831

Published: 13 July 2019 Publication History

Abstract

Adversarial learning has been established as a successful paradigm in reinforcement learning. We propose a hybrid adversarial learner where a reinforcement learning agent tries to solve a problem while an evolutionary algorithm tries to find problem instances that are hard to solve for the current expertise of the agent, causing the intelligent agent to co-evolve with a set of test instances or scenarios. We apply this setup, called scenario co-evolution, to a simulated smart factory problem that combines task scheduling with navigation of a grid world. We show that the so trained agent outperforms conventional reinforcement learning. We also show that the scenarios evolved this way can provide useful test cases for the evaluation of any (however trained) agent.

References

[1]

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. 2016. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565 (2016).

[2]

Saswat Anand, Edmund K Burke, Tsong Yueh Chen, John Clark, Myra B Cohen, Wolfgang Grieskamp, Mark Harman, Mary Jean Harrold, Phil Mcminn, Antonia Bertolino, et al. 2013. An orchestrated survey of methodologies for automated software test case generation. Journal of Systems and Software 86, 8 (2013), 1978--2001.

Digital Library

[3]

Thomas Anthony, Zheng Tian, and David Barber. 2017. Thinking fast and slow with deep learning and tree search. In Advances in Neural Information Processing Systems. 5360--5370.

Digital Library

[4]

Andrea Arcuri and Xin Yao. 2008. A novel co-evolutionary approach to automatic software bug fixing. In Evolutionary Computation, 2008. CEC 2008.(IEEE World Congress on Computational Intelligence). IEEE Congress on. IEEE, 162--168.

[5]

Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, and Igor Mordatch. 2018. Emergent Complexity via Multi-Agent Competition. In ICLR.

[6]

Lenz Belzner, Michael Till Beck, Thomas Gabor, Harald Roelle, and Horst Sauer. 2016. Software engineering for distributed autonomous real-time systems. In Proceedings of the 2nd International Workshop on Software Engineering for Smart Cyber-Physical Systems. ACM, 54--57.

Digital Library

[7]

Lenz Belzner and Thomas Gabor. 2017. Bayesian verification under model uncertainty. In Software Engineering for Smart Cyber-Physical Systems (SEsCPS), 2017 IEEE/ACM 3rd International Workshop on. IEEE, 10--13.

Digital Library

[8]

Camillo Bérénos, K Mathias Wegner, and Paul Schmid-Hempel. 2010. Antagonistic coevolution with parasites maintains host genetic diversity: an experimental test. Proceedings of the Royal Society of London B: Biological Sciences (2010).

[9]

Hans-Georg Beyer. 2000. Evolutionary algorithms in noisy environments: Theoretical issues and guidelines for practice. Computer methods in applied mechanics and engineering 186, 2--4 (2000), 239--267.

[10]

Joshua Brown, Zhi Quan Zhou, and Yang-Wai Chow. 2018. Metamorphic Testing of Navigation Software: A Pilot Study with Google Maps. In Proceedings of the 51st Hawaii International Conference on System Sciences.

[11]

Tomas Bures, Danny Weyns, Christian Berger, Stefan Biffl, Marian Daun, Thomas Gabor, David Garlan, Ilias Gerostathopoulos, Christine Julien, Filip Krikava, et al. 2015. Software Engineering for Smart Cyber-Physical Systems-Towards a Research Agenda: Report on the First International Workshop on Software Engineering for Smart CPS. ACM SIGSOFT Software Engineering Notes 40, 6 (2015), 28--32.

Digital Library

[12]

Fulvio Corno, Ernesto Sánchez, Matteo Sonza Reorda, and Giovanni Squillero. 2004. Automatic test program generation: a case study. IEEE Design & Test of Computers 21, 2 (2004), 102--109.

Digital Library

[13]

Rogério De Lemos, Holger Giese, Hausi A Müller, Mary Shaw, Jesper Andersson, Marin Litoiu, Bradley Schmerl, Gabriel Tamura, Norha M Villegas, Thomas Vogel, et al. 2013. Software engineering for self-adaptive systems: A second research roadmap. In Software Engineering for Self-Adaptive Systems II. Springer, 1--32.

[14]

Daniel Dewey. 2014. Reinforcement learning and the reward engineering principle. In 2014 AAAI Spring Symposium Series.

[15]

Jon Edvardsson. 1999. A survey on automatic test data generation. In Proceedings of the 2nd Conference on Computer Science and Engineering. 21--28.

[16]

Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, and Pieter Abbeel. 2017. Reverse curriculum generation for reinforcement learning. arXiv preprint arXiv:1707.05300 (2017).

[17]

Gordon Fraser and Andrea Arcuri. 2011. Evolutionary generation of whole test suites. In 2011 11th International Conference on Quality Software. IEEE, 31--40.

Digital Library

[18]

Gordon Fraser and Andrea Arcuri. 2011. EvoSuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. ACM, 416--419.

Digital Library

[19]

Gordon Fraser and Andrea Arcuri. 2014. A large-scale evaluation of automated unit test generation using EvoSuite. ACM Transactions on Software Engineering and Methodology (TOSEM) 24, 2 (2014), 8.

Digital Library

[20]

Thomas Gabor, Lenz Belzner, and Claudia Linnhoff-Popien. 2018. Inheritance-based diversity measures for explicit convergence control in evolutionary algorithms. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, 841--848.

Digital Library

[21]

Thomas Gabor, Lenz Belzner, Thomy Phan, and Kyrill Schmid. 2018. Preparing for the Unexpected: Diversity Improves Planning Resilience in Evolutionary Algorithms. In 2018 IEEE International Conference on Autonomic Computing (ICAC). IEEE, 131--140.

[22]

Thomas Gabor, Marie Kiermeier, Andreas Sedlmeier, Bernhard Kempter, Cornel Klein, Horst Sauer, Reiner Schmid, and Jan Wieghardt. 2018. Adapting quality assurance to adaptive systems: the scenario coevolution paradigm. In International Symposium on Leveraging Applications of Formal Methods. Springer, 137--154.

[23]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.

Digital Library

[24]

Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. 2018. Deep reinforcement learning that matters. In Thirty-Second AAAI Conference on Artificial Intelligence.

[25]

Matteo Hessel, Joseph Modayil, Hado Van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver. 2018. Rainbow: Combining improvements in deep reinforcement learning. In Thirty-Second AAAI Conference on Artificial Intelligence.

[26]

Matthias Hölzl, Nora Koch, Mariachiara Puviani, Martin Wirsing, and Franco Zambonelli. 2015. The ensemble development life cycle and best practices for collective autonomic systems. In Software Engineering for Collective Autonomic Systems. Springer, 325--354.

[27]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[28]

Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).

[29]

Kiran Lakhotia, Mark Harman, and Phil McMinn. 2007. A multi-objective approach to search-based test data generation. In Proceedings of the 9th annual conference on Genetic and evolutionary computation. ACM, 1098--1105.

Digital Library

[30]

Phil McMinn. 2004. Search-based software test data generation: a survey. Software testing, Verification and reliability 14, 2 (2004), 105--156.

Digital Library

[31]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.

[32]

Nick Moran and Jordan Pollack. 2018. Coevolutionary Neural Population Models. arXiv preprint arXiv:1804.04187 (2018).

[33]

Jason Morrison and Franz Oppacher. 1999. A general model of co-evolution for genetic algorithms. In Artificial Neural Nets and Genetic Algorithms. Springer, 262--268.

[34]

Glenford J Myers, Corey Sandler, and Tom Badgett. 2011. The art of software testing. John Wiley & Sons.

Digital Library

[35]

Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10). 807--814.

Digital Library

[36]

Geoff S Nitschke, AE Eiben, and Martijn C Schut. 2012. Evolving team behaviors with specialization. Genetic Programming and Evolvable Machines 13, 4 (2012), 493--536.

Digital Library

[37]

Randal S Olson, David B Knoester, and Christoph Adami. 2016. Evolution of swarming behavior is shaped by how predators attack. Artificial life 22, 3 (2016), 299--318.

Digital Library

[38]

Anay Pattanaik, Zhenyi Tang, Shuijing Liu, Gautham Bommannan, and Girish Chowdhary. 2018. Robust deep reinforcement learning with adversarial attacks. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 2040--2042.

Digital Library

[39]

Lerrel Pinto, James Davidson, Rahul Sukthankar, and Abhinav Gupta. 2017. Robust adversarial reinforcement learning. arXiv preprint arXiv:1703.02702 (2017).

[40]

Jordan B Pollack and Alan D Blair. 1998. Co-evolution in the successful learning of backgammon strategy. Machine learning 32, 3 (1998), 225--240.

Digital Library

[41]

Martin L Puterman. 2014. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.

[42]

André Reichstaller, Thomas Gabor, and Alexander Knapp. 2018. Mutation-based test suite evolution for self-organizing systems. In International Symposium on Leveraging Applications of Formal Methods. Springer, 118--136.

[43]

Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).

[44]

Arthur L Samuel. 1959. Some studies in machine learning using the game of checkers. IBM Journal of research and development 3, 3 (1959), 210--229.

Digital Library

[45]

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484.

[46]

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of Go without human knowledge. Nature 550, 7676 (2017), 354.

[47]

Giovanni Squillero and Alberto Tonda. 2016. Divergence of character and premature convergence: A survey of methodologies for promoting diversity in evolutionary optimization. Information Sciences 329 (2016), 782--799.

Digital Library

[48]

Richard S Sutton and Andrew G Barto. 1998. Introduction to reinforcement learning. Vol. 135. MIT press Cambridge.

Digital Library

[49]

Gerald Tesauro. 1995. Temporal difference learning and TD-Gammon. Commun. ACM 38, 3 (1995), 58--69.

Digital Library

[50]

Rui Wang, Joel Lehman, Jeff Clune, and Kenneth O Stanley. 2019. Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions. arXiv preprint arXiv:1901.01753 (2019).

[51]

Christopher John Cornish Hellaby Watkins. 1989. Learning from delayed rewards. Ph.D. Dissertation. King's College, Cambridge.

[52]

Joachim Wegener, Kerstin Buhr, and Hartmut Pohlheim. 2002. Automatic test data generation for structural testing of embedded software systems by evolutionary testing. In Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation. Morgan Kaufmann Publishers Inc., 1233--1240.

Digital Library

[53]

Daan Wierstra, Tom Schaul, Jan Peters, and Juergen Schmidhuber. 2008. Natural evolution strategies. In IEEE World Congress on Computational Intelligence. IEEE, 3381--3387.

[54]

Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3--4 (1992), 229--256.

Digital Library

[55]

Chern Han Yong and Risto Miikkulainen. 2001. Cooperative coevolution of multi-agent systems. University of Texas at Austin, Austin, TX (2001).

Digital Library

Cited By

Phan TDriscoll JRomberg JKoenig SDastani MSichman JAlechina NDignum V(2024)Confidence-Based Curriculum Learning for Multi-Agent Path FindingProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663016(1558-1566)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663016
Park JLee JKim TAhn IPark J(2021)Co-Evolution of Predator-Prey Ecosystems by Reinforcement Learning AgentsEntropy10.3390/e2304046123:4(461)Online publication date: 13-Apr-2021
https://doi.org/10.3390/e23040461
Phan TGabor TSedlmeier ARitz FKempter BKlein CSauer HSchmid RWieghardt JZeller MLinnhoff-Popien CEl Fallah Seghrouchni ASukthankar GAn BYorke-Smith N(2020)Learning and Testing Resilience in Cooperative Multi-Agent SystemsProceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3398761.3398884(1055-1063)Online publication date: 5-May-2020
https://dl.acm.org/doi/10.5555/3398761.3398884
Show More Cited By

Index Terms

Scenario co-evolution for reinforcement learning on a grid world smart factory domain
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Mobile agents
    2. Planning and scheduling
      1. Robotic planning
  2. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
        Adversarial learning
    2. Machine learning approaches

Recommendations

Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Relational Reinforcement Learning

Relational reinforcement learning is presented, a learning technique that combines reinforcement learning with relational learning or inductive logic programming. Due to the use of a more expressive representation language to represent states, actions ...
Reinforcement learning algorithms: A brief survey
Highlights
- RL can be used to solve problems involving sequential decision-making.
- RL is based on trial-and-error learning through rewards and punishments.
- The ultimate goal of an RL agent is to maximize cumulative reward.
- RL agent tries ...
Abstract
Reinforcement Learning (RL) is a machine learning (ML) technique to learn sequential decision-making in complex problems. RL is inspired by trial-and-error based human/animal learning. It can learn an optimal policy autonomously with knowledge ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference

July 2019

1545 pages

ISBN:9781450361118

DOI:10.1145/3321707

Editor:
Manuel López-Ibáñez
University of Manchester, UK
,
General Chairs:
Anne Auger
Inria and Ecole Polytechnique, France
,
Thomas Stützle
IRIDIA, Université libre de Bruxelles Belgium

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

GECCO '19

Sponsor:

SIGEVO

GECCO '19: Genetic and Evolutionary Computation Conference

July 13 - 17, 2019

Prague, Czech Republic

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
270
Total Downloads

Downloads (Last 12 months)45
Downloads (Last 6 weeks)1

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Phan TDriscoll JRomberg JKoenig SDastani MSichman JAlechina NDignum V(2024)Confidence-Based Curriculum Learning for Multi-Agent Path FindingProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663016(1558-1566)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663016
Park JLee JKim TAhn IPark J(2021)Co-Evolution of Predator-Prey Ecosystems by Reinforcement Learning AgentsEntropy10.3390/e2304046123:4(461)Online publication date: 13-Apr-2021
https://doi.org/10.3390/e23040461
Phan TGabor TSedlmeier ARitz FKempter BKlein CSauer HSchmid RWieghardt JZeller MLinnhoff-Popien CEl Fallah Seghrouchni ASukthankar GAn BYorke-Smith N(2020)Learning and Testing Resilience in Cooperative Multi-Agent SystemsProceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3398761.3398884(1055-1063)Online publication date: 5-May-2020
https://dl.acm.org/doi/10.5555/3398761.3398884
Gabor TSedlmeier APhan TRitz FKiermeier MBelzner LKempter BKlein CSauer HSchmid RWieghardt JZeller MLinnhoff-Popien C(2020)The scenario coevolution paradigm: adaptive quality assurance for adaptive systemsInternational Journal on Software Tools for Technology Transfer10.1007/s10009-020-00560-5Online publication date: 6-Mar-2020
https://doi.org/10.1007/s10009-020-00560-5

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents