research-article

Transfer Reinforcement Learning for Autonomous Driving: From WiseMove to WiseSim

Authors:

Aravind Balakrishnan,

Krzysztof Czarnecki,

Sean SedwardsAuthors Info & Claims

ACM Transactions on Modeling and Computer Simulation (TOMACS), Volume 31, Issue 3

Article No.: 15, Pages 1 - 26

https://doi.org/10.1145/3449356

Published: 18 July 2021 Publication History

Abstract

Reinforcement learning (RL) is an attractive way to implement high-level decision-making policies for autonomous driving, but learning directly from a real vehicle or a high-fidelity simulator is variously infeasible. We therefore consider the problem of transfer reinforcement learning and study how a policy learned in a simple environment using WiseMove can be transferred to our high-fidelity simulator, WiseMove. WiseMove is a framework to study safety and other aspects of RL for autonomous driving. WiseMove accurately reproduces the dynamics and software stack of our real vehicle.

We find that the accurately modelled perception errors in WiseMove contribute the most to the transfer problem. These errors, when even naively modelled in WiseMove, provide an RL policy that performs better in WiseMove than a hand-crafted rule-based policy. Applying domain randomization to the environment in WiseMove yields an even better policy. The final RL policy reduces the failures due to perception errors from 10% to 2.75%. We also observe that the RL policy has significantly less reliance on velocity compared to the rule-based policy, having learned that its measurement is unreliable.

References

[1]

Marcin Andrychowicz et al. 2018. Learning Dexterous In-hand Manipulation. arxiv:1808.00177.

[2]

P. Bender, J. Ziegler, and C. Stiller. 2014. Lanelets: Efficient map representation for autonomous driving. In Proceedings of the IEEE Intelligent Vehicles Symposium Proceedings. 420–425.

[3]

K. Bousmalis et al. 2018. Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’18). IEEE, 4243–4250.

[4]

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. arxiv:1606.01540.

[5]

J. Chen, Z. Wang, and M. Tomizuka. 2018. Deep hierarchical reinforcement learning for autonomous driving with distinct behaviors. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV). IEEE, 1239–1244.

[6]

Jillian D’Onfro. 2018. “I hate them”: Locals reportedly are frustrated with Alphabet’s self-driving cars. CNBC (Aug. 2018). Retrieved from: www.cnbc.com/2018/08/28/locals-reportedly-frustrated-with-alphabets-waymo-self-driving-cars.html.

[7]

Pablo Alvarez Lopez et al. 2018. Microscopic traffic simulation using SUMO. In Proceedings of the 21st IEEE Intelligent Transportation Systems Conference (ITSC’18). 2575–2582. Retrieved from: https://elib.dlr.de/124092/.

[8]

Aaron Fisher, Cynthia Rudin, and Francesca Dominici. 2018. All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. arxiv:stat.ME/1801.01489.

[9]

Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 32, 11 (2013), 1231–1237.

Digital Library

[10]

Abhishek Gupta, Coline Devin, Yuxuan Liu, Pieter Abbeel, and Sergey Levine. 2017. Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning. arxiv:1703.02949.

[11]

Hado van Hasselt, Arthur Guez, and David Silver. 2016. Deep reinforcement learning with double Q-learning. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. AAAI Press, 2094–2100.

[12]

Bryce Antony Hosking. 2018. Modelling and Model Predictive Control of Power-split Hybrid Powertrains for Self-driving Vehicles. Master’s thesis. University of Waterloo.

[13]

Paul Hudson. 2011. Slow drivers cause the most frustration. Telegraph July (2011). Retrieved from: www.telegraph.co.uk/motoring/news/8649662/Slow-drivers-cause-the-most-frustration.html.

[14]

David Isele and Akansel Cosgun. 2017. To go or not to go: A case for Q-learning at unsignalized intersections. In Proceedings of the ICML Workshop on Machine Learning for Autonomous Vehicles (MLAV’17). Retrieved from: openreview.net/forum?id=SyCR5JLG-.

[15]

David Isele, Reza Rahimi, Akansel Cosgun, Kaushik Subramanian, and Kikuo Fujimura. 2018. Navigating occluded intersections with autonomous vehicles using deep reinforcement learning. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’18). IEEE, 2034–2039.

[16]

Karel J. Keesman. 2011. System Identification: An Introduction. Springer-Verlag, London.

[17]

Sangjun Koo, Hwanjo Yu, and Gary Geunbae Lee. 2019. Adversarial approach to domain adaptation for reinforcement learning on dialog systems. Pattern Recog. Lett. 128 (2019), 467–473.

[18]

Jaeyoung Lee, Aravind Balakrishnan, Ashish Gaurav, Krzysztof Czarnecki, and Sean Sedwards. 2019. WiseMove: A framework to investigate safe deep reinforcement learning for autonomous driving. In Proceedings of the Quantitative Evaluation of Systems (Lecture Notes in Computer Science), David Parker and Verena Wolf (Eds.). Springer International Publishing, Cham, 350–354.

[19]

Martin Leucker and Christian Schallhart. 2009. A brief account of runtime verification. J. Log. Algeb. Prog. 78, 5 (2009), 293–303.

[20]

Antonio Loquercio, Elia Kaufmann, René Ranftl, Alexey Dosovitskiy, Vladlen Koltun, and Davide Scaramuzza. 2019. Deep Drone Racing: From Simulation to Reality with Domain Randomization. arxiv:1905.09727.

[21]

Volodymyr Mnih et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529–533.

[22]

Matthias Müller, Alexey Dosovitskiy, Bernard Ghanem, and Vladlen Koltun. 2018. Driving Policy Transfer via Modularity and Abstraction. arxiv:1804.09364.

[23]

Ofir Nachum, Michael Ahn, Hugo Ponte, Shixiang Gu, and Vikash Kumar. 2019. Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real. arxiv:1908.05224.

[24]

OpenAI. 2018. OpenAI Five. Retrieved from: https://blog.openai.com/openai-five/.

[25]

Brian Paden, Michal Čáp, Sze Zheng Yong, Dmitry Yershov, and Emilio Frazzoli. 2016. A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans. Intell. Vehic. 1, 1 (Mar. 2016), 33–55.

[26]

C. Paxton, V. Raman, G. D. Hager, and M. Kobilarov. 2017. Combining neural networks and tree search for task and motion planning in challenging environments. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’17). IEEE, 6059–6066.

[27]

Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. 2017. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization. arxiv:1710.06537.

[28]

R. Queiroz, T. Berger, and K. Czarnecki. 2019. GeoScenario: An open DSL for autonomous driving scenario representation. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV’19). IEEE, 287–294.

[29]

Morgan Quigley, Brian Gerkey, Ken Conley, Josh Faust, Tully Foote, Jeremy Leibs, Eric Berger, Rob Wheeler, and Andrew Ng. 2009. ROS: An open-source robot operating system. In Proceedings of the ICRA Workshop on Open Source Robotics, Vol. 3. IEEE.

[30]

Alejandro Rodriguez-Ramos, Carlos Sampedro, Hriday Bavle, Paloma de la Puente, and Pascual Campoy. 2019. A deep reinforcement learning strategy for UAV autonomous landing on a moving platform. J. Intell. Robot. Syst. 93, 1 (2019), 351–366.

Digital Library

[31]

Fereshteh Sadeghi and Sergey Levine. 2016. CADRL: Real Single-Image Flight without a Single Real Image. arxiv:1611.04201.

[32]

Ahmad El Sallab, Mohammed Abdou, Etienne Perot, and Senthil Yogamani. 2017. Deep reinforcement learning framework for autonomous driving. Electron. Imag. 2017, 19 (Jan. 2017), 70–76.

[33]

Shai Shalev-Shwartz, Shaked Shammah, and Amnon Shashua. 2016. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving. arxiv:1610.03295.

[34]

David Silver et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484–489.

[35]

Pei Sun et al. 2019. Scalability in Perception for Autonomous Driving: Waymo Open Dataset. arxiv:1912.04838.

[36]

Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. The MIT Press, Cambridge, MA.

Digital Library

[37]

Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. 2017. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. arxiv:1703.06907.

[38]

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel. 2017. Domain randomization for transferring deep neural networks from simulation to the real world. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’17). IEEE, 23–30.

[39]

Chris Urmson et al. 2008. Autonomous driving in urban environments: Boss and the urban challenge. J. Field Robot. 25, 8 (2008), 425–466.

[40]

Matthew Van Gennip. 2018. Vehicle Dynamic Modelling and Parameter Identification for an Autonomous Vehicle. Master’s thesis. University of Waterloo.

[41]

Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Van Hasselt, Marc Lanctot, and Nando De Freitas. 2015. Dueling network architectures for deep reinforcement learning. arxiv:1511.06581.

[42]

Yurong You, Xinlei Pan, Ziyan Wang, and Cewu Lu. 2017. Virtual to Real Reinforcement Learning for Autonomous Driving. arxiv:1704.03952.

[43]

Fisher Yu et al. 2018. BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling. arxiv:1805.04687.

[44]

N. Zimmerman, C. Schlenoff, and S. Balakirsky. 2004. Implementing a rule-based system to represent decision criteria for on-road autonomous navigation. In Proceedings of the Proceedings of the AAAI Spring Symposium on Knowledge Representation and Ontologies for Autonomous Systems. AAAI.

Cited By

Li YQu J(2024)A novel neural network architecture and cross-model transfer learning for multi-task autonomous drivingData Technologies and Applications10.1108/DTA-08-2022-0307Online publication date: 12-Apr-2024
https://doi.org/10.1108/DTA-08-2022-0307
Hu DHuang CYin GLi YHuang YHuang HWu JLi WXie H(2024)A transfer-based reinforcement learning collaborative energy management strategy for extended-range electric buses with cabin temperature comfort considerationEnergy10.1016/j.energy.2023.130097290(130097)Online publication date: Mar-2024
https://doi.org/10.1016/j.energy.2023.130097
Wang XLi JCao LRan DJi MSun KHan YMa Z(2024)A data-knowledge joint-driven reinforcement learning algorithm based on guided policy and state-prediction for satellite continuous-thrust trackingAdvances in Space Research10.1016/j.asr.2024.06.07074:8(4089-4108)Online publication date: Oct-2024
https://doi.org/10.1016/j.asr.2024.06.070
Show More Cited By

Index Terms

Transfer Reinforcement Learning for Autonomous Driving: From WiseMove to WiseSim
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Multi-task learning
        Transfer learning
      2. Reinforcement learning
    2. Machine learning approaches
      1. Rule learning
  2. Modeling and simulation

Recommendations

Effectiveness of Transfer Learning in Autonomous Driving using Model Car
ICMLC '21: Proceedings of the 2021 13th International Conference on Machine Learning and Computing

We have known that reinforcement learning, deep learning, and deep reinforcement learning effectively acquire action rules for the autonomous motion of objects. However, it is known that these learning processes require a large amount of learning time. ...
Deep Reinforcement Learning with Noisy Exploration for Autonomous Driving
ICMLSC '22: Proceedings of the 2022 6th International Conference on Machine Learning and Soft Computing

Autonomous driving decision-making is a great challenge in complex traffic environment, and the deep reinforcement learning (DRL) can contribute to the more intelligent strategy. In the autonomous driving scenarios with DRL algorithms, sufficient ...
A deep reinforcement learning-based approach for autonomous lane-changing velocity control in mixed flow of vehicle group level
Abstract
As an important driving behavior, lane-changing has a great impact on the safety and efficiency of traffic flow interacting with surrounding vehicles, especially in mixed traffic flows with autonomous vehicles and human-driven vehicles. This ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Modeling and Computer Simulation

ACM Transactions on Modeling and Computer Simulation Volume 31, Issue 3

Special Issue on Qest 2019

July 2021

149 pages

ISSN:1049-3301

EISSN:1558-1195

DOI:10.1145/3476822

Editor:
Francesco Quaglia
University of Rome Tor Vergata, Italy

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2021

Accepted: 01 February 2021

Revised: 01 October 2020

Received: 01 April 2020

Published in TOMACS Volume 31, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant
Japanese Science and Technology Agency (JST), Exploratory Research for Advanced Technology (ERATO)
Natural Sciences and Engineering Research Council of Canada (NSERC), Collaborative Research and Training Experience program (CREATE)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
287
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)7

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li YQu J(2024)A novel neural network architecture and cross-model transfer learning for multi-task autonomous drivingData Technologies and Applications10.1108/DTA-08-2022-0307Online publication date: 12-Apr-2024
https://doi.org/10.1108/DTA-08-2022-0307
Hu DHuang CYin GLi YHuang YHuang HWu JLi WXie H(2024)A transfer-based reinforcement learning collaborative energy management strategy for extended-range electric buses with cabin temperature comfort considerationEnergy10.1016/j.energy.2023.130097290(130097)Online publication date: Mar-2024
https://doi.org/10.1016/j.energy.2023.130097
Wang XLi JCao LRan DJi MSun KHan YMa Z(2024)A data-knowledge joint-driven reinforcement learning algorithm based on guided policy and state-prediction for satellite continuous-thrust trackingAdvances in Space Research10.1016/j.asr.2024.06.07074:8(4089-4108)Online publication date: Oct-2024
https://doi.org/10.1016/j.asr.2024.06.070
Peng XLiang JZhang XDong MOta KBu X(2023)LK-TDDQN:A Lane Keeping Transfer Double Deep Q Network Framework for Autonomous VehiclesGLOBECOM 2023 - 2023 IEEE Global Communications Conference10.1109/GLOBECOM54140.2023.10437047(3518-3523)Online publication date: 4-Dec-2023
https://doi.org/10.1109/GLOBECOM54140.2023.10437047

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents