article

A Deep Reinforcement Learning Strategy for UAV Autonomous Landing on a Moving Platform

Authors:

Alejandro Rodriguez-Ramos,

Carlos Sampedro,

Paloma De La Puente,

Pascual CampoyAuthors Info & Claims

Journal of Intelligent and Robotic Systems, Volume 93, Issue 1-2

Pages 351 - 366

https://doi.org/10.1007/s10846-018-0891-8

Published: 01 February 2019 Publication History

Abstract

The use of multi-rotor UAVs in industrial and civil applications has been extensively encouraged by the rapid innovation in all the technologies involved. In particular, deep learning techniques for motion control have recently taken a major qualitative step, since the successful application of Deep Q-Learning to the continuous action domain in Atari-like games. Based on these ideas, Deep Deterministic Policy Gradients (DDPG) algorithm was able to provide outstanding results with continuous state and action domains, which are a requirement in most of the robotics-related tasks. In this context, the research community is lacking the integration of realistic simulation systems with the reinforcement learning paradigm, enabling the application of deep reinforcement learning algorithms to the robotics field. In this paper, a versatile Gazebo-based reinforcement learning framework has been designed and validated with a continuous UAV landing task. The UAV landing maneuver on a moving platform has been solved by means of the novel DDPG algorithm, which has been integrated in our reinforcement learning framework. Several experiments have been performed in a wide variety of conditions for both simulated and real flights, demonstrating the generality of the approach. As an indirect result, a powerful work flow for robotics has been validated, where robots can learn in simulation and perform properly in real operation environments. To the best of the authors knowledge, this is the first work that addresses the continuous UAV landing maneuver on a moving platform by means of a state-of-the-art deep reinforcement learning algorithm, trained in simulation and tested in real flights.

References

[1]

Rucco, A., Sujit, P.B., Aguiar, A.P., Sousa, J.B., Pereira, F.L.: Optimal rendezvous trajectory for unmanned aerial-ground vehicles. arXiv:1612.06100 (2016)

[2]

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467 (2016)

[3]

Borowczyk, A., Nguyen, D.-T., Phu-Van Nguyen, A., Nguyen, D.Q., Saussié, D., Ny, J.L.: Autonomous Landing of a multirotor micro air vehicle on a high velocity ground vehicle. In: IFAC World Congress (2017)

[4]

Ananthakrishnan, U.S., Akshay, N., Manikutty, G., Bhavani, R.R.: Control of quadrotors using neural networks for precise landing maneuvers (2017)

[5]

Araar, O., Aouf, N., Vitanov, I.: Vision based autonomous landing of multirotor uav on moving platform. J. Intell. Robot. Syst. 85(2), 369---384 (2017)

Digital Library

[6]

Arora, S., Jain, S., Scherer, S., Nuske, S., Chamberlain, L., Singh, S.: Infrastructure-free shipdeck tracking for autonomous landing. In: 2013 IEEE International Conference on Robotics and Automation (ICRA), pp. 323---330 (2013)

[7]

Blösch, M., Weiss, S., Scaramuzza, D., Siegwart, R.: Vision based mav navigation in unknown and unstructured environments. In: 2010 IEEE International Conference on Robotics and Automation (ICRA), pp. 21---28. IEEE (2010)

[8]

Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: Openai gym. arXiv:1606.01540 (2016)

[9]

Cantelli, L., Mangiameli, M., Melita, C.D., Muscato, G.: Uav/Ugv cooperation for surveying operations in humanitarian demining. In: 2013 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), pp. 1---6. IEEE (2013)

[10]

Dorigo, M., Colombetti, M.: Robot Shaping: an Experiment in Behavior Engineering. MIT Press, Cambridge (1998)

Digital Library

[11]

Espié, E., Guionneau, C., Wymann, B., Dimitrakakis, C., Coulom, R., Sumner, A.: Torcs-the open racing car simulator. Available at: http://torcs.sourceforge.net (2005)

[12]

Falanga, D., Zanchettin, A., Simovic, A., Delmerico, J., Scaramuzza, D.: Vision-based autonomous quadrotor landing on a moving platform

[13]

Furrer, F., Burri, M., Achtelik, M., Siegwart, R.: Robot operating system (ROS): the complete reference (Volume 1), chap. RotorS--A Modular Gazebo MAV Simulator Framework, pp 595---625. Springer International Publishing, Cham (2016).

[14]

Gautam, A., Sujit, P.B., Saripalli, S.: A survey of autonomous landing techniques for uavs. In: 2014 International Conference on Unmanned Aircraft Systems (ICUAS) (2014)

[15]

Gautam, A., Sujit, P.B., Saripalli, S.: Application of Guidance Laws to Quadrotor Landing. In: 2015 International Conference on Unmanned Aircraft Systems (ICUAS) (2015)

[16]

Giusti, A., Guzzi, J., Cireşan, D.C., He, F.L., Rodríguez, J.P., Fontana, F., Faessler, M., Forster, C., Schmidhuber, J., Di Caro, G., et al.: A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robotics and Automation Letters 1(2), 661---667 (2016)

[17]

Gu, S., Lillicrap, T., Sutskever, I., Levine, S.: Continuous deep q-learning with model-based acceleration. In: International Conference on Machine Learning, pp. 2829---2838 (2016)

Digital Library

[18]

Hu, B., Lu, L., Mishra, S.: Fast, safe and precise landing of a quadrotor on an oscillating platform. In: 2015 American Control Conference (ACC) (2015)

[19]

Ivakhnenko, A.G.: Polynomial theory of complex systems. IEEE Trans. Syst. Man Cybern. 1(4), 364---378 (1971)

[20]

Kai, W., Chunzhen, S., Yi, J.: Research on adaptive guidance technology of uav ship landing system based on net recovery. Procedia Engineering 99, 1027---1034 (2015)

[21]

Kelchtermans, K., Tuytelaars, T.: How hard is it to cross the room?---training (recurrent) neural networks to steer a uav. arXiv:1702.07600 (2017)

[22]

Kendoul, F., Ahmed, B.: Bio-inspired taupilot for automated aerial 4d docking and landing of unmanned aircraft systems. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (2012)

[23]

Kim, D.K., Chen, T.: Deep neural network for real-time autonomous indoor navigation. arXiv:1511.04668 (2015)

[24]

Kim, J., Jung, Y., Lee, D., Shim, D.H.: Landing control on a mobile platform for multi-copters using an omnidirectional image sensor. J. Intell. Robot. Syst. 84(1---4), 529---541 (2016)

Digital Library

[25]

Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32 (11), 1238---1274 (2013)

Digital Library

[26]

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097---1105 (2012)

Digital Library

[27]

Lee, D., Ryan, T., Kim, H.J.: Autonomous landing of a vtol uav on a moving platform using image-based visual servoing. In: 2012 IEEE International Conference on Robotics and Automation (2012)

[28]

Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015)

[29]

Ling, K., Chow, D., Das, A., Waslander, S.L.: Autonomous maritime landings for low-cost vtol aerial vehicles. In: 2014 Canadian Conference on Computer and Robot Vision (2014)

Digital Library

[30]

Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928---1937 (2016)

Digital Library

[31]

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning. arXiv:1312.5602 (2013)

[32]

Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., Leibs, J., Wheeler, R., Ng, A.Y.: Ros: an open-source robot operating system. In: ICRA Workshop on Open Source Software, vol. 3, p. 5. Kobe (2009)

[33]

Rezelj, A.: Autonomous charging of a quadrocopter by landing at a mobile platform (2013)

[34]

Rodriguez-Ramos, A., Sampedro, C., Bavle, H., Milosevic, Z., Garcia-Vaquero, A., Campoy, P.: Towards fully autonomous landing on moving platforms for rotary unmanned aerial vehicles. In: 2017 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 170---178. IEEE (2017)

[35]

Sadeghi, F., Levine, S.: rl: real single image flight without a single real image. 12, arXiv:1611.04201 (2016)

[36]

Sampedro, C., Bavle, H., Rodríguez-Ramos, A., Carrio, A., Fernández, R.A.S., Sanchez-Lopez, J.L., Campoy, P.: A fully-autonomous aerial robotic solution for the 2016 international micro air vehicle competition. In: 2017 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 989---998. IEEE (2017)

[37]

Sanchez-Lopez, J.L., Fernández, R.A.S., Bavle, H., Sampedro, C., Molina, M., Pestana, J., Campoy, P.: Aerostack: an architecture and open-source software framework for aerial robotics. In: 2016 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 332---341. IEEE (2016)

[38]

Santana, P., Correia, L., Mendonça, R., Alves, N., Barata, J.: Tracking natural trails with swarm-based visual saliency. J. Field Rob. 30(1), 64---86 (2013)

Digital Library

[39]

Serra, P., Cunha, R., Hamel, T., Cabecinhas, D., Silvestre, C.: Landing of a quadrotor on a moving target using dynamic image-based visual servo control. IEEE Trans. Robot. 32(6), 1524---1535 (2016)

Digital Library

[40]

Shaker, M., Smith, M.N., Yue, S., Duckett, T.: Vision-based landing of a simulated unmanned aerial vehicle with fast reinforcement learning. In: 2010 International Conference on Emerging Security Technologies (EST), pp. 183---188. IEEE (2010)

Digital Library

[41]

Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 387---395 (2014)

Digital Library

[42]

Skoczylas, M.: Vision analysis system for autonomous landing of micro drone. Acta Mechanica et Automatica 8(4), 199---203 (2015)

[43]

Sutton, R.S., Barto, A.G.: Reinforcement Learning: an Introduction, vol. 1. MIT Press, Cambridge (1998)

Digital Library

[44]

Todorov, E., Erez, T., Tassa, Y.: Mujoco: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5026---5033. IEEE (2012)

[45]

Uhlenbeck, G.E., Ornstein, L.S.: On the theory of the brownian motion. Phys. Rev. 36(5), 823 (1930)

[46]

Vlantis, P., Marantos, P., Bechlioulis, C.P., Kyriakopoulos, K.J.: Quadrotor landing on an inclined platform of a moving ground vehicle. In: 2015 IEEE International Conference on Robotics and Automation (ICRA) (2015)

[47]

Wenzel, K.E., Masselli, A., Zell, A.: Automatic take off, tracking and landing of a miniature uav on a moving carrier vehicle. J. Intell. Robot. Syst. 61(1---4), 221---238 (2011)

Digital Library

[48]

Zamora, I., Lopez, N.G., Vilches, V.M., Cordero, A.H.: Extending the openai gym for robotics: a toolkit for reinforcement learning using ros and gazebo. arXiv:1608.05742 (2016)

[49]

Zhang, T., Kahn, G., Levine, S., Abbeel, P.: Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 528---535. IEEE (2016)

Digital Library

Cited By

Ge FZhang GLi ZZhou F(2024)A FPGA Accelerator of Distributed A3C Algorithm with Optimal Resource DeploymentIET Computers & Digital Techniques10.1049/2024/78552502024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1049/2024/7855250
Goldschmid PAhmad A(2024)Reinforcement learning based autonomous multi-rotor landing on moving platformsAutonomous Robots10.1007/s10514-024-10162-848:4-5Online publication date: 6-Jun-2024
https://dl.acm.org/doi/10.1007/s10514-024-10162-8
Backman KKulić DChung H(2023)Reinforcement learning for shared autonomy drone landingsAutonomous Robots10.1007/s10514-023-10143-347:8(1419-1438)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1007/s10514-023-10143-3
Show More Cited By

A Deep Reinforcement Learning Strategy for UAV Autonomous Landing on a Moving Platform
1. Computer systems organization
  1. Embedded and cyber-physical systems
2. Computing methodologies
  1. Artificial intelligence
    1. Control methods
    2. Planning and scheduling

Recommendations

Reinforcement Learning for UAV Attitude Control

Autopilot systems are typically composed of an “inner loop” providing stability and control, whereas an “outer loop” is responsible for mission-level objectives, such as way-point navigation. Autopilot systems for unmanned aerial vehicles are ...
Adaptive vision-based system for landing an autonomous hexacopter drone on a specific landing platform

Today, autonomous flight and precise landing of drones are crucial for many applications such as object detection, and delivery of services. The focus of this research is to present an adaptive vision-based system for landing an autonomous hexacopter ...
Learning to Perform a Perched Landing on the Ground Using Deep Reinforcement Learning

A UAV with a variable sweep wing has the potential to perform a perched landing on the ground by achieving high pitch rates to take advantage of dynamic stall. This study focuses on the generation and evaluation of a trajectory to perform a perched ...

Comments

Information & Contributors

Information

Published In

cover image Journal of Intelligent and Robotic Systems

Journal of Intelligent and Robotic Systems Volume 93, Issue 1-2

February 2019

401 pages

ISSN:0921-0296

Issue’s Table of Contents

Copyright © Copyright © 2019 Springer Nature B.V.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 February 2019

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ge FZhang GLi ZZhou F(2024)A FPGA Accelerator of Distributed A3C Algorithm with Optimal Resource DeploymentIET Computers & Digital Techniques10.1049/2024/78552502024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1049/2024/7855250
Goldschmid PAhmad A(2024)Reinforcement learning based autonomous multi-rotor landing on moving platformsAutonomous Robots10.1007/s10514-024-10162-848:4-5Online publication date: 6-Jun-2024
https://dl.acm.org/doi/10.1007/s10514-024-10162-8
Backman KKulić DChung H(2023)Reinforcement learning for shared autonomy drone landingsAutonomous Robots10.1007/s10514-023-10143-347:8(1419-1438)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1007/s10514-023-10143-3
Wang XWang YSu XWang LLu CPeng HLiu J(2023)Deep reinforcement learning-based air combat maneuver decision-making: literature review, implementation tutorial and future directionArtificial Intelligence Review10.1007/s10462-023-10620-257:1Online publication date: 28-Dec-2023
https://dl.acm.org/doi/10.1007/s10462-023-10620-2
Liu HGuo PJin XDeng HXu KDing X(2022)An Intelligent Planning Method For The Multi-Rotor Manipulation Robot With Reinforcement Learning2022 IEEE International Conference on Mechatronics and Automation (ICMA)10.1109/ICMA54519.2022.9856385(1028-1033)Online publication date: 7-Aug-2022
https://dl.acm.org/doi/10.1109/ICMA54519.2022.9856385
Xu GJiang WWang ZWang Y(2022)Autonomous Obstacle Avoidance and Target Tracking of UAV Based on Deep Reinforcement LearningJournal of Intelligent and Robotic Systems10.1007/s10846-022-01601-8104:4Online publication date: 1-Apr-2022
https://dl.acm.org/doi/10.1007/s10846-022-01601-8
Din AAkhtar SMaqsood AHabib MMir I(2022)Modified model free dynamic programming :an augmented approach for unmanned aerial vehicleApplied Intelligence10.1007/s10489-022-03510-753:3(3048-3068)Online publication date: 20-May-2022
https://dl.acm.org/doi/10.1007/s10489-022-03510-7
Din AMir IGul FMir SAlhady SAl Nasar MAlkhazaleh HAbualigah L(2022)Robust flight control system design of a fixed wing UAV using optimal dynamic programmingSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-022-07484-z27:6(3053-3064)Online publication date: 23-Sep-2022
https://dl.acm.org/doi/10.1007/s00500-022-07484-z
Balakrishnan ALee JGaurav ACzarnecki KSedwards S(2021)Transfer Reinforcement Learning for Autonomous DrivingACM Transactions on Modeling and Computer Simulation10.1145/344935631:3(1-26)Online publication date: 18-Jul-2021
https://dl.acm.org/doi/10.1145/3449356
Cacace JSilva MFontanelli GLippiello V(2021)A Novel Articulated Rover for Industrial Pipes Inspection Tasks2021 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM)10.1109/AIM46487.2021.9517691(1027-1032)Online publication date: 12-Jul-2021
https://dl.acm.org/doi/10.1109/AIM46487.2021.9517691
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents