Experimenting with Minari, through importing older D4RL datasets (tested in Python 3.10.11).
Download D4RL and Minari:
- clone https://github.com/Farama-Foundation/Minari . As Minari is under activate development, this conversion may need to be modified at some point. To use the version of Minari tested, one can use https://github.com/daniellawson9999/Minari.
- clone D4RL https://github.com/Farama-Foundation/D4RL
- Setup separate dependencies, e.g, conda environment for each repo
cd 4rl_mujoco_minari
Activate D4RL environment and run:
python mujoco_d4rl_to_pkl.py --dir={save_dir}
where save_dir is the directory to store D4RL .pkl files.
Activate Minari environment and run:
python mujoco_pkl_to_minari.py --dir={save_dir}
Where author, author_email, code_permalink can be added optionally.
Test loading new environments:
import minari
dataset = minari.load_dataset('d4rl_halfcheetah-expert-v2')
env = dataset.recover_environment() # Deprecated HalfCheetah-v3 environment
# Sample an episode
episode = dataset.sample_episodes(n_episodes=1)[0]
New, port 1% of 1st seed of dqn-replay to Minari.
Minari:
- pip install minari==0.4.1
Run:
python download_convert.py --convert
The dataset name follows the convention {game}-top1-s{index}-v0, which can be set by passing --index, which defaults to 1, matching the seed used by work like Scaled QL. Example of loading a dataset, where Breakout can be replaced with any dataset in ./atari_minari/atari_games.py.
import minari
from atari_minari.utils import create_atari_env
dataset = minari.load_dataset('Breakout-top1-s1-v0')
base_env = dataset.recover_environment() # Recommended to instead build env, as follows:
env = create_atari_env('ALE/Breakout-v5', repeat_action_probability=0.25, clip_rewards=False)
# disable action_repeat for some evaluation
env = create_atari_env('ALE/Breakout-v5', repeat_action_probability=0.0, clip_rewards=False)
# Sample an episode
episode = dataset.sample_episodes(n_episodes=1)[0]
OLD procedure
Download D4RL and Minari:
- clone https://github.com/Farama-Foundation/Minari (do not have to clone again if already followed setup in previous step)
- clone Atari https://github.com/takuseno/d4rl-atari
- Setup separate dependencies, e.g, conda environment for each repo
- in Atari environment, run:
pip install gym[atari]
,pip install gym[accept-rom-license]
Activate Atari environment and run:
python atari_to_pkl.py --dir={save_dir}
where save_dir is the directory to store D4RL .npz files.
To convert datasets, run: To test, activate Minari environment and run:
python atari_pkl_to_minari.py.py --dir={save_dir}
This will create dataset(s), with the name {env_name}-{dataset_type}_s{seed}-v0, where env_name is the name of the environment, e.g. Breakout. Seed and dataset_type follow from https://github.com/takuseno/d4rl-atari, where we test with expert, which contains datasets consisting of the last 1M steps of training. _s{seed} specified which trained agent to use, which is referred to as -v in Takuma's Github, but renamed to seed (_s) as -v is used to specify dataset version in Minari.
Example of loading a dataset:
import minari
from atari_minari.utils import create_atari_env
dataset = minari.load_dataset('Breakout-expert_s0-v0')
base_env = dataset.recover_environment() # Recommended to instead build env, as follows:
env = create_atari_env('ALE/Breakout-v5', repeat_action_probability=0.25, clip_rewards=True)
# disable action_repeat for some evaluation
env = create_atari_env('ALE/Breakout-v5', repeat_action_probability=0.0, clip_rewards=True)
# Sample an episode
episode = dataset.sample_episodes(n_episodes=1)[0]
There are several things to note:
- dataset.recover_environment() will return the environment without reward_clipping due to issues serializing TransformReward(). To load with environment clipping, recreate the environment with create_atari_env() and pass clip_rewards=True
- While the dataset Optimistic Perspective on Offline Reinforcement Learning is collected with repeat_action_probability=0.25, two recent papers, Multi-Game Decision Transformers, Scaled QL, which aim at creating generalist Atari agents use this dataset for training, but set repeat_action_probability=0.0 during evaluation.
- Both the dataset,and the environment, return un-scaled 84x84 observations, with values ranging from 0 to 255. One should normalize these values before network input, such as by dividing observations by 255 to scale to 0 to 1, or use another normalization scheme.