Skip to content

Latest commit

 

History

History
55 lines (43 loc) · 2.12 KB

README.md

File metadata and controls

55 lines (43 loc) · 2.12 KB

Fully parameterized Quantile Function (FQF)

Tensorflow implementation of paper

Fully Parameterized Quantile Function for Distribution Reinforcement Learning

Derek Yang, Li Zhao, Zichuan Lin, Tao Qin, Jiang Bian, Tie-yan Liu

If you use this code in your research, please cite

@inproceedings{yang2019fully,
  title={Fully Parameterized Quantile Function for Distributional Reinforcement Learning},
  author={Yang, Derek and Zhao, Li and Lin, Zichuan and Qin, Tao and Bian, Jiang and Liu, Tie-Yan},
  booktitle={Advances in Neural Information Processing Systems},
  pages={6190--6199},
  year={2019}
}

Requirements

  • python==3.6
  • tensorflow
  • gym
  • absl-py
  • atari-py
  • gin-config
  • opencv-python

Installation on Ubuntu

sudo apt-get update && sudo apt-get install cmake zlib1g-dev
pip install absl-py atari-py gin-config==0.1.4 gym opencv-python tensorflow-gpu==1.12.0
cd FQF
pip install -e .

Experiments

  • Our experiments and hyper-parameter searching can be simply run as the following
cd FQF/dopamine/discrete_domains
bash run-fqf.sh

Bug Fixed

  • It is recommended to use the L2 loss on gradient for probability proposal network, or clip the largest proposed probability to 0.98. The reason is as follows: in quantile function, when the probability goes to 1, the quantile value goes to infinity(or a very large number). Although a very large quantile value is reasonable for a probability such as 0.9999999, with limited approximation ability of neural network, quantile values for other probabilities will go up quickly, leading to a performance drop.

Acknowledgement

  • Our code is implemented based on dopamine.

Code of Conduct