Perception and Autonomy for UAVs in tight indoor spaces is a highly challenging task especially in two particular scenarios:
- No prior maps.
- No GPS coordinates.
The goal is to autonomously navigate through the below Airsim environment which has circular holes carved on the walls. The robot/agent should traverse maximum number of holes by predicting the continuous actions(v) based on input camera observations(I) without colliding to any of the walls.
$ conda activate rl0
To start training the poilcy(PPO or SAC)
$ python or
$ python
To monitor logs with tensorboard use below command by replace with your own logdir:
$ tensorboard --logdir tb_logs/ppo_run_1733388980.4439118_1
$ python
"Reaching the limit in autonomous racing: Optimal control versus reinforcement learning." Song, Yunlong, Angel Romero, Matthias Müller, Vladlen Koltun, and Davide Scaramuzza. Science Robotics, (2023)
"Proximal policy optimization algorithms", Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O., arXiv preprint arXiv:1707.06347. (2017).
"Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", Haarnoja, Tuomas, Aurick Zhou, Pieter Abbeel, and Sergey Levine, In International conference on machine learning, pp. 1861-1870. PMLR, 2018.
"Deep-Reinforcement-Learning-Based Autonomous UAV Navigation With Sparse Rewards," C. Wang, J. Wang, J. Wang and X. Zhang, in IEEE Internet of Things Journal, vol. 7, no. 7, pp. 6180-6190, July 2020.
"Autonomous UAV navigation via deep reinforcement learning using PPO." Kabas, Bilal, 2022 30th Signal Processing and Communications Applications Conference (SIU). IEEE, 2022.
"Autonomous uav navigation using reinforcement learning.", Pham, Huy X., et al, arXiv preprint arXiv:1801.05086 (2018).