Skip to content
/ BCQ Public
forked from sfujim/BCQ

Author's PyTorch implementation of BCQ for "Off-Policy Deep Reinforcement Learning without Exploration"

License

Notifications You must be signed in to change notification settings

c0derzer0/BCQ

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Off-Policy Deep Reinforcement Learning without Exploration

Code for Batch-Constrained deep Q-Learning (BCQ). If you use our code please cite the paper.

Method is tested on MuJoCo continuous control tasks in OpenAI gym. Networks are trained using PyTorch 1.4 and Python 3.6.

Overview

Batch-Constrained deep Q-learning (BCQ) is the first batch deep reinforcement learning, an algorithm which aims to learn offline, without interactions with the environment.

If you are interested in reproducing some of the results from the paper, a behavioral policy (DDPG) needs to be trained by running

main.py --train_behavioral

This will save the PyTorch model. A new buffer can then be collected by running

main.py --generate_buffer

Finally train BCQ by running

main.py

Settings can be adjusted with different arguments to main.py.

DDPG was updated to learn more consistently. Additionally, with version updates to Python, PyTorch and environments, results may not correspond exactly to the paper.

Bibtex

@inproceedings{fujimoto2019off,
  title={Off-Policy Deep Reinforcement Learning without Exploration},
  author={Fujimoto, Scott and Meger, David and Precup, Doina},
  booktitle={International Conference on Machine Learning},
  pages={2052--2062},
  year={2019}
}

About

Author's PyTorch implementation of BCQ for "Off-Policy Deep Reinforcement Learning without Exploration"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%