Variational Latent Oracle Guiding (VLOG)

This repository includes the source code for the ICLR 2022 paper "Variational oracle guiding for reinforcement learning" (https://openreview.net/forum?id=pjqqxepwoMy) by Dongqi Han, Tadashi Kozuno, Xufang Luo, Zhao-Yun Chen, Kenji Doya, Yuqing Yang and Dongsheng Li.

@inproceedings{han2022variational,
    title={Variational oracle guiding for reinforcement learning},
    author={Dongqi Han and Tadashi Kozuno and Xufang Luo and Zhao-Yun Chen and Kenji Doya and Yuqing Yang and Dongsheng Li},
    booktitle={International Conference on Learning Representations},
    year={2022},
    url={https://openreview.net/forum?id=pjqqxepwoMy}
}

Dependence

torch
numpy
scipy
gym
gym-maze (https://github.com/MattChanTK/gym-maze)
minatar (https://github.com/kenjyoung/MinAtar)

For the experiments on Mahjong, see below.

How to use

Maze and noisy MinAtar

For example, to run experiment of VLOG in the Maze task.

python run_vlog.py --env 0 --type_id 1

where the arguments are explained as follows

env	Environment
0	Maze
1	Breakout
2	Seaquest
3	Space invaders
4	Freeway
5	Asterix

type_id	Method
1	VLOG
2	Baseline
3	Oracle
4	VLOG-no oracle
5	Suphx-style
6	OPD-style

After finishing an experiment, the data (including performance curve etc.) and model will be saved to ./data/

To load a model, one may run

import torch
model = torch.load("./data/xxx.model")

To load the data, one may run

import scipy
data = scipy.io.loadmat("./data/xxx.mat")

to get a python dictionary data.

Then one can plot the performance curve, e.g.,

import matplotlib.pyplot as plt
plt.plot(data["global_steps"], data["performance_greedy_action_wrt_step"])
plt.xlabel("global steps")
plt.ylabel("return")
plt.show()

Mahjong

To used the code for traning Mahjong agent, you will need https://github.com/pymahjong/pymahjong.

However, we highly recommend to use the newer version of our Mahjong environment (https://github.com/Agony5757/mahjong). The newer version is much faster, and it provides more convienient APIs to the decision AI, including playing with pre-trained VLOG agents. The newer version is different from the older version in terms of some rules and fixing of some bugs.

The follows explain how to use the code for the older version (https://github.com/pymahjong/pymahjong).

Training

To train the model, one may do

python run_vlog_mahjong.py --type id 1 --cql 1

CQL will be used if the argument "cql" is 1, and BC will be used if "cql" is 0.

Evaluation

To train the model, one should first train the models using run_vlog_mahjong.py (We provide an example of trained model of VLOG using CQL in the supplementary material of https://openreview.net/forum?id=pjqqxepwoMy)
To evaluate the agents, one may do

python eval_vlog_mahjong.py --times 8 --model_dir_0 [model_dir_0] --model_dir_1 [model_dir_1] --model_dir_2 [model_dir_2] --model_dir_3 [model_dir_3]

where "model_dir_i" (i=0,1,2,3) is the directory of .model file for player i on the table. The agents will play for 8 games (1 match) in this example. The result will also be saved in to a .mat file in ./data/

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
base_modules.py		base_modules.py
eval_vlog_mahjong.py		eval_vlog_mahjong.py
minatar_env.py		minatar_env.py
models.py		models.py
run_vlog.py		run_vlog.py
run_vlog_mahjong.py		run_vlog_mahjong.py
tasks.py		tasks.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Variational Latent Oracle Guiding (VLOG)

Dependence

How to use

Maze and noisy MinAtar

Mahjong

Training

Evaluation

About

Releases

Packages

Languages

License

FrostHan/vlog

Folders and files

Latest commit

History

Repository files navigation

Variational Latent Oracle Guiding (VLOG)

Dependence

How to use

Maze and noisy MinAtar

Mahjong

Training

Evaluation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages