Off-Policy Deep Reinforcement Learning without Exploration

Code for Batch-Constrained deep Q-Learning (BCQ). If you use our code please cite the paper.

Method is tested on MuJoCo continuous control tasks in OpenAI gym. Networks are trained using PyTorch 1.4 and Python 3.6.

Overview

Batch-Constrained deep Q-learning (BCQ) is the first batch deep reinforcement learning, an algorithm which aims to learn offline, without interactions with the environment.

If you are interested in reproducing some of the results from the paper, a behavioral policy (DDPG) needs to be trained by running

main.py --train_behavioral

This will save the PyTorch model. A new buffer can then be collected by running

main.py --generate_buffer

Finally train BCQ by running

main.py

Settings can be adjusted with different arguments to main.py.

DDPG was updated to learn more consistently. Additionally, with version updates to Python, PyTorch and environments, results may not correspond exactly to the paper.

Bibtex

@inproceedings{fujimoto2019off,
  title={Off-Policy Deep Reinforcement Learning without Exploration},
  author={Fujimoto, Scott and Meger, David and Precup, Doina},
  booktitle={International Conference on Machine Learning},
  pages={2052--2062},
  year={2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
tests		tests
.gitignore		.gitignore
BCQ.py		BCQ.py
BCQ_for_RSPMN.py		BCQ_for_RSPMN.py
DBCQ.py		DBCQ.py
DDPG.py		DDPG.py
LICENSE		LICENSE
README.md		README.md
main.py		main.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Off-Policy Deep Reinforcement Learning without Exploration

Overview

Bibtex

About

Releases

Packages

Languages

License

c0derzer0/BCQ

Folders and files

Latest commit

History

Repository files navigation

Off-Policy Deep Reinforcement Learning without Exploration

Overview

Bibtex

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages