MuZero Reanalyze Implementation And Investigation

This a reimplementation of MuZero Reanalyze using the open-source version of MuZero Duvaud, Werner; Hainaut, Aurèle and Lenoir, Paul (see README_MuZeroGeneral).

The implementation was tested on Cartpole-v1 from OpenAi Gym and the implementation of Tic Tac Toe from the same authors of MuZero General. Two implementations were tested:

A synchronous one that uses multiple worker to push batches on a queue while updating the target values and policies. The trainer process pulls one batch at a time for training. This implementation stays true to the original descirption in Appendix H of the original MuZero paper.
A completely asynchronous one that updates samples directly in the replay buffer. This is much faster but does not faithfully reproduce the process described in the original paper.

Installation

git clone https://github.com/PhilippeMarcotte/muzero-general.git
cd muzero-general

pip install -r requirements.txt

Command for reproducing the results

python muzero.py --game_name <configuration name> --action "Train" --logger tensorboard --seed <seed>

The configuration used are located in the games fodler. However, only the name is required. Here are all the configurations used for the experiments:

basic_tictactoe_ratio_0_5
true_reanalyze_tictactoe_ratio_0_5
fast_reanalyze_tictactoe_ratio_0_5
basic_cartpole_75_ratio_0_25 (seed=[0,10,20,30,40])
basic_cartpole_75_ratio_0_5 (seed=[0,10,20,30,40])
basic_cartpole_75_ratio_1 (seed=[0,10,20,30,40])
basic_cartpole_75_ratio_2 (seed=[0,10,20,30,40])
true_reanalyze_cartpole_75_ratio_0_25 (seed=[0,10,20,50,60])
true_reanalyzebasic_cartpole_75_ratio_0_5 (seed=[0,10,20,50,60])
true_reanalyze_cartpole_75_ratio_1 (seed=[0,10,20,50,60])
true_reanalyze_cartpole_75_ratio_2 (seed=[0,10,20,50,60])
fast_reanalyze_cartpole_75_ratio_0_25 (seed=[0,10,20,30,40])
fast_reanalyze_cartpole_75_ratio_0_5 (seed=[0,10,20,30,40])
fast_reanalyze_cartpole_75_ratio_1 (seed=[0,10,20,30,40])
fast_reanalyze_cartpole_75_ratio_2 (seed=[0,10,20,30,40])

Example

python muzero.py --game_name basic_cartpole_75_ratio_0_25 --action "Train" --logger tensorboard --seed 0

For further information please see the original README.

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
configs		configs
docs		docs
evaluation_results		evaluation_results
games		games
pretrained_models		pretrained_models
results		results
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
README_MuZeroGeneral.md		README_MuZeroGeneral.md
cartpole_evaluation.py		cartpole_evaluation.py
evaluation_stats.py		evaluation_stats.py
experiments.sh		experiments.sh
fast_reanalyze.py		fast_reanalyze.py
models.py		models.py
muzero.py		muzero.py
notebook.ipynb		notebook.ipynb
reanalyze.py		reanalyze.py
remote_run_script.sh		remote_run_script.sh
replay_buffer.py		replay_buffer.py
report.ipynb		report.ipynb
requirements.txt		requirements.txt
self_play.py		self_play.py
shared_storage.py		shared_storage.py
start_experiments.sh		start_experiments.sh
tictactoe_evaluation.py		tictactoe_evaluation.py
trainer.py		trainer.py
upload_weights.py		upload_weights.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MuZero Reanalyze Implementation And Investigation

Installation

Command for reproducing the results

Example

About

Releases

Packages

Languages

License

PhilippeMarcotte/muzero-general

Folders and files

Latest commit

History

Repository files navigation

MuZero Reanalyze Implementation And Investigation

Installation

Command for reproducing the results

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages