skip to main content
research-article

Transfer Reinforcement Learning for Autonomous Driving: From WiseMove to WiseSim

Published: 18 July 2021 Publication History

Abstract

Reinforcement learning (RL) is an attractive way to implement high-level decision-making policies for autonomous driving, but learning directly from a real vehicle or a high-fidelity simulator is variously infeasible. We therefore consider the problem of transfer reinforcement learning and study how a policy learned in a simple environment using WiseMove can be transferred to our high-fidelity simulator, WiseMove. WiseMove is a framework to study safety and other aspects of RL for autonomous driving. WiseMove accurately reproduces the dynamics and software stack of our real vehicle.
We find that the accurately modelled perception errors in WiseMove contribute the most to the transfer problem. These errors, when even naively modelled in WiseMove, provide an RL policy that performs better in WiseMove than a hand-crafted rule-based policy. Applying domain randomization to the environment in WiseMove yields an even better policy. The final RL policy reduces the failures due to perception errors from 10% to 2.75%. We also observe that the RL policy has significantly less reliance on velocity compared to the rule-based policy, having learned that its measurement is unreliable.

References

[1]
Marcin Andrychowicz et al. 2018. Learning Dexterous In-hand Manipulation. arxiv:1808.00177.
[2]
P. Bender, J. Ziegler, and C. Stiller. 2014. Lanelets: Efficient map representation for autonomous driving. In Proceedings of the IEEE Intelligent Vehicles Symposium Proceedings. 420–425.
[3]
K. Bousmalis et al. 2018. Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’18). IEEE, 4243–4250.
[4]
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. arxiv:1606.01540.
[5]
J. Chen, Z. Wang, and M. Tomizuka. 2018. Deep hierarchical reinforcement learning for autonomous driving with distinct behaviors. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV). IEEE, 1239–1244.
[6]
Jillian D’Onfro. 2018. “I hate them”: Locals reportedly are frustrated with Alphabet’s self-driving cars. CNBC (Aug. 2018). Retrieved from: www.cnbc.com/2018/08/28/locals-reportedly-frustrated-with-alphabets-waymo-self-driving-cars.html.
[7]
Pablo Alvarez Lopez et al. 2018. Microscopic traffic simulation using SUMO. In Proceedings of the 21st IEEE Intelligent Transportation Systems Conference (ITSC’18). 2575–2582. Retrieved from: https://elib.dlr.de/124092/.
[8]
Aaron Fisher, Cynthia Rudin, and Francesca Dominici. 2018. All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. arxiv:stat.ME/1801.01489.
[9]
Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 32, 11 (2013), 1231–1237.
[10]
Abhishek Gupta, Coline Devin, Yuxuan Liu, Pieter Abbeel, and Sergey Levine. 2017. Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning. arxiv:1703.02949.
[11]
Hado van Hasselt, Arthur Guez, and David Silver. 2016. Deep reinforcement learning with double Q-learning. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. AAAI Press, 2094–2100.
[12]
Bryce Antony Hosking. 2018. Modelling and Model Predictive Control of Power-split Hybrid Powertrains for Self-driving Vehicles. Master’s thesis. University of Waterloo.
[13]
Paul Hudson. 2011. Slow drivers cause the most frustration. Telegraph July (2011). Retrieved from: www.telegraph.co.uk/motoring/news/8649662/Slow-drivers-cause-the-most-frustration.html.
[14]
David Isele and Akansel Cosgun. 2017. To go or not to go: A case for Q-learning at unsignalized intersections. In Proceedings of the ICML Workshop on Machine Learning for Autonomous Vehicles (MLAV’17). Retrieved from: openreview.net/forum?id=SyCR5JLG-.
[15]
David Isele, Reza Rahimi, Akansel Cosgun, Kaushik Subramanian, and Kikuo Fujimura. 2018. Navigating occluded intersections with autonomous vehicles using deep reinforcement learning. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’18). IEEE, 2034–2039.
[16]
Karel J. Keesman. 2011. System Identification: An Introduction. Springer-Verlag, London.
[17]
Sangjun Koo, Hwanjo Yu, and Gary Geunbae Lee. 2019. Adversarial approach to domain adaptation for reinforcement learning on dialog systems. Pattern Recog. Lett. 128 (2019), 467–473.
[18]
Jaeyoung Lee, Aravind Balakrishnan, Ashish Gaurav, Krzysztof Czarnecki, and Sean Sedwards. 2019. WiseMove: A framework to investigate safe deep reinforcement learning for autonomous driving. In Proceedings of the Quantitative Evaluation of Systems (Lecture Notes in Computer Science), David Parker and Verena Wolf (Eds.). Springer International Publishing, Cham, 350–354.
[19]
Martin Leucker and Christian Schallhart. 2009. A brief account of runtime verification. J. Log. Algeb. Prog. 78, 5 (2009), 293–303.
[20]
Antonio Loquercio, Elia Kaufmann, René Ranftl, Alexey Dosovitskiy, Vladlen Koltun, and Davide Scaramuzza. 2019. Deep Drone Racing: From Simulation to Reality with Domain Randomization. arxiv:1905.09727.
[21]
Volodymyr Mnih et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529–533.
[22]
Matthias Müller, Alexey Dosovitskiy, Bernard Ghanem, and Vladlen Koltun. 2018. Driving Policy Transfer via Modularity and Abstraction. arxiv:1804.09364.
[23]
Ofir Nachum, Michael Ahn, Hugo Ponte, Shixiang Gu, and Vikash Kumar. 2019. Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real. arxiv:1908.05224.
[24]
OpenAI. 2018. OpenAI Five. Retrieved from: https://blog.openai.com/openai-five/.
[25]
Brian Paden, Michal Čáp, Sze Zheng Yong, Dmitry Yershov, and Emilio Frazzoli. 2016. A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans. Intell. Vehic. 1, 1 (Mar. 2016), 33–55.
[26]
C. Paxton, V. Raman, G. D. Hager, and M. Kobilarov. 2017. Combining neural networks and tree search for task and motion planning in challenging environments. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’17). IEEE, 6059–6066.
[27]
Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. 2017. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization. arxiv:1710.06537.
[28]
R. Queiroz, T. Berger, and K. Czarnecki. 2019. GeoScenario: An open DSL for autonomous driving scenario representation. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV’19). IEEE, 287–294.
[29]
Morgan Quigley, Brian Gerkey, Ken Conley, Josh Faust, Tully Foote, Jeremy Leibs, Eric Berger, Rob Wheeler, and Andrew Ng. 2009. ROS: An open-source robot operating system. In Proceedings of the ICRA Workshop on Open Source Robotics, Vol. 3. IEEE.
[30]
Alejandro Rodriguez-Ramos, Carlos Sampedro, Hriday Bavle, Paloma de la Puente, and Pascual Campoy. 2019. A deep reinforcement learning strategy for UAV autonomous landing on a moving platform. J. Intell. Robot. Syst. 93, 1 (2019), 351–366.
[31]
Fereshteh Sadeghi and Sergey Levine. 2016. CADRL: Real Single-Image Flight without a Single Real Image. arxiv:1611.04201.
[32]
Ahmad El Sallab, Mohammed Abdou, Etienne Perot, and Senthil Yogamani. 2017. Deep reinforcement learning framework for autonomous driving. Electron. Imag. 2017, 19 (Jan. 2017), 70–76.
[33]
Shai Shalev-Shwartz, Shaked Shammah, and Amnon Shashua. 2016. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving. arxiv:1610.03295.
[34]
David Silver et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484–489.
[35]
Pei Sun et al. 2019. Scalability in Perception for Autonomous Driving: Waymo Open Dataset. arxiv:1912.04838.
[36]
Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. The MIT Press, Cambridge, MA.
[37]
Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. 2017. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. arxiv:1703.06907.
[38]
J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel. 2017. Domain randomization for transferring deep neural networks from simulation to the real world. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’17). IEEE, 23–30.
[39]
Chris Urmson et al. 2008. Autonomous driving in urban environments: Boss and the urban challenge. J. Field Robot. 25, 8 (2008), 425–466.
[40]
Matthew Van Gennip. 2018. Vehicle Dynamic Modelling and Parameter Identification for an Autonomous Vehicle. Master’s thesis. University of Waterloo.
[41]
Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Van Hasselt, Marc Lanctot, and Nando De Freitas. 2015. Dueling network architectures for deep reinforcement learning. arxiv:1511.06581.
[42]
Yurong You, Xinlei Pan, Ziyan Wang, and Cewu Lu. 2017. Virtual to Real Reinforcement Learning for Autonomous Driving. arxiv:1704.03952.
[43]
Fisher Yu et al. 2018. BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling. arxiv:1805.04687.
[44]
N. Zimmerman, C. Schlenoff, and S. Balakirsky. 2004. Implementing a rule-based system to represent decision criteria for on-road autonomous navigation. In Proceedings of the Proceedings of the AAAI Spring Symposium on Knowledge Representation and Ontologies for Autonomous Systems. AAAI.

Cited By

View all
  • (2024)A novel neural network architecture and cross-model transfer learning for multi-task autonomous drivingData Technologies and Applications10.1108/DTA-08-2022-0307Online publication date: 12-Apr-2024
  • (2024)A transfer-based reinforcement learning collaborative energy management strategy for extended-range electric buses with cabin temperature comfort considerationEnergy10.1016/j.energy.2023.130097290(130097)Online publication date: Mar-2024
  • (2024)A data-knowledge joint-driven reinforcement learning algorithm based on guided policy and state-prediction for satellite continuous-thrust trackingAdvances in Space Research10.1016/j.asr.2024.06.07074:8(4089-4108)Online publication date: Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Modeling and Computer Simulation
ACM Transactions on Modeling and Computer Simulation  Volume 31, Issue 3
Special Issue on Qest 2019
July 2021
149 pages
ISSN:1049-3301
EISSN:1558-1195
DOI:10.1145/3476822
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2021
Accepted: 01 February 2021
Revised: 01 October 2020
Received: 01 April 2020
Published in TOMACS Volume 31, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Transfer reinforcement learning
  2. autonomous driving
  3. deep reinforcement learning
  4. policy distillation

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant
  • Japanese Science and Technology Agency (JST), Exploratory Research for Advanced Technology (ERATO)
  • Natural Sciences and Engineering Research Council of Canada (NSERC), Collaborative Research and Training Experience program (CREATE)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)7
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A novel neural network architecture and cross-model transfer learning for multi-task autonomous drivingData Technologies and Applications10.1108/DTA-08-2022-0307Online publication date: 12-Apr-2024
  • (2024)A transfer-based reinforcement learning collaborative energy management strategy for extended-range electric buses with cabin temperature comfort considerationEnergy10.1016/j.energy.2023.130097290(130097)Online publication date: Mar-2024
  • (2024)A data-knowledge joint-driven reinforcement learning algorithm based on guided policy and state-prediction for satellite continuous-thrust trackingAdvances in Space Research10.1016/j.asr.2024.06.07074:8(4089-4108)Online publication date: Oct-2024
  • (2023)LK-TDDQN:A Lane Keeping Transfer Double Deep Q Network Framework for Autonomous VehiclesGLOBECOM 2023 - 2023 IEEE Global Communications Conference10.1109/GLOBECOM54140.2023.10437047(3518-3523)Online publication date: 4-Dec-2023

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media