Skip to content

PhilippeMarcotte/muzero-general

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MuZero Reanalyze Implementation And Investigation

This a reimplementation of MuZero Reanalyze using the open-source version of MuZero Duvaud, Werner; Hainaut, Aurèle and Lenoir, Paul (see README_MuZeroGeneral).

The implementation was tested on Cartpole-v1 from OpenAi Gym and the implementation of Tic Tac Toe from the same authors of MuZero General. Two implementations were tested:

  • A synchronous one that uses multiple worker to push batches on a queue while updating the target values and policies. The trainer process pulls one batch at a time for training. This implementation stays true to the original descirption in Appendix H of the original MuZero paper.
  • A completely asynchronous one that updates samples directly in the replay buffer. This is much faster but does not faithfully reproduce the process described in the original paper.

Installation

git clone https://github.com/PhilippeMarcotte/muzero-general.git
cd muzero-general

pip install -r requirements.txt

Command for reproducing the results

python muzero.py --game_name <configuration name> --action "Train" --logger tensorboard --seed <seed>

The configuration used are located in the games fodler. However, only the name is required. Here are all the configurations used for the experiments:

  • basic_tictactoe_ratio_0_5

  • true_reanalyze_tictactoe_ratio_0_5

  • fast_reanalyze_tictactoe_ratio_0_5

  • basic_cartpole_75_ratio_0_25 (seed=[0,10,20,30,40])

  • basic_cartpole_75_ratio_0_5 (seed=[0,10,20,30,40])

  • basic_cartpole_75_ratio_1 (seed=[0,10,20,30,40])

  • basic_cartpole_75_ratio_2 (seed=[0,10,20,30,40])

  • true_reanalyze_cartpole_75_ratio_0_25 (seed=[0,10,20,50,60])

  • true_reanalyzebasic_cartpole_75_ratio_0_5 (seed=[0,10,20,50,60])

  • true_reanalyze_cartpole_75_ratio_1 (seed=[0,10,20,50,60])

  • true_reanalyze_cartpole_75_ratio_2 (seed=[0,10,20,50,60])

  • fast_reanalyze_cartpole_75_ratio_0_25 (seed=[0,10,20,30,40])

  • fast_reanalyze_cartpole_75_ratio_0_5 (seed=[0,10,20,30,40])

  • fast_reanalyze_cartpole_75_ratio_1 (seed=[0,10,20,30,40])

  • fast_reanalyze_cartpole_75_ratio_2 (seed=[0,10,20,30,40])

Example

python muzero.py --game_name basic_cartpole_75_ratio_0_25 --action "Train" --logger tensorboard --seed 0

For further information please see the original README.

Releases

No releases published

Packages

No packages published

Languages

  • Python 90.1%
  • Jupyter Notebook 9.6%
  • Shell 0.3%