Keras Implementation of DDPG(Deep Deterministic Policy Gradient) with PER(Prioritized Experience Replay) option on OpenAI gym framework
Status : IMPLEMENTING
Extended Work : gym-td3-keras
(TD3)
We used Adam (Kingma & Ba, 2014) for learning the neural network parameters with a learning rate of 10−4 and 10−3 for the actor and critic respectively. For Q we included L2 weight decay of 10−2 and used a discount factor of γ = 0.99. For the soft target updates we used τ = 0.001. The neural networks used the rectified non-linearity (Glorot et al., 2011) for all hidden layers. The final output layer of the actor was a tanh layer, to bound the actions. The low-dimensional networks had 2 hidden layers with 400 and 300 units respectively (≈ 130,000 parameters). Actions were not included until the 2nd hidden layer of Q.
- optimizer : Adam
- learning rate: 10-4 ~ 10-3
- weight decay: 10-2 (for regularization)
- discount factor: 0.99(for q-network)
- tau : 0.001 (for soft target update)
- activation : ReLU(for hidden layer), tanh(for output layer)
- layers: 400, 300 for each hidden layer
- Make an independent environment using
virtualenv
# install virtualenv module
sudo apt-get install python3-pip
sudo pip3 install virtualenv
# create a virtual environment named venv
virtualenv venv
# activate the environment
source venv/bin/activate
To escape the environment, deactivate
- Install the requirements
pip install -r requirements.txt
- Run the training node
#trainnig
python train.py
[1] Continuous control with deep reinforcement learning
@misc{lillicrap2015continuous,
title={Continuous control with deep reinforcement learning},
author={Timothy P. Lillicrap and Jonathan J. Hunt and Alexander Pritzel and Nicolas Heess and Tom Erez and Yuval Tassa and David Silver and Daan Wierstra},
year={2015},
eprint={1509.02971},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
[3] anita-hu/TF2-RL
[4] marload/DeepRL-TensorFlow2
[5] openai/baselines