Model-Advantage and Value-Aware Models for Model-Based Reinforcement Learning: Bridging the Gap in Theory and Practice

Code for the MBPO experiments in Model-Advantage and Value-Aware Models for Model-Based Reinforcement Learning: Bridging the Gap in Theory and Practice. This repository is a clone of MBRL-Lib.

Install

Please see the original README of MBRL-Lib for installing all the requirements of mbrl-lib. PyTorch>=1.7 is required by mbrl-lib and the experiments for this paper were ran with PyTorch==1.9.1. Additionally, this repository uses Weights and Biases for tracking experiments on top of mbrl-lib. After installing PyTorch and mbrl-lib, the run the following commands to install extra dependencies.

# Patchelf required for mujoco-py==2.1.2.14
conda install patchelf=0.12
pip install -r requirements/va_mbpo_requirements.txt

Alternatively, the file requirements/conda_va_mbpo.yaml is provided to reproduce the conda environment for this code.

Run

In order to run MBPO with a value-aware model learning objective, the following example command may be called (in this case, with the mbrl/examples/conf/overrides/mbpo_halfcheetah config file). Two new arguments are used for selecting amongst value-aware objectives and MLE. overrides.model_loss_type can either be va for value-aware or mle for the default maximum-likelihood model learning objective. dynamics_model.va_norm can be set to l1 for the MA-L1 objective or l2 for the VAML objective.

python -m mbrl/examples/main.py algorithm=mbpo action_optimizer=cem overrides=mbpo_halfcheetah dynamics_model=gaussian_mlp_ensemble +overrides.model_loss_type=va experiment=halfcheetah_va_l1

If tracking experiments with Weights and Biases (wandb), edit mbrl/examples/conf/main.yaml to set the wandb_project_name. For example, the following command activates wandb for tracking with the use_wandb=1 option.

python -m mbrl/examples/main.py algorithm=mbpo action_optimizer=cem overrides=mbpo_halfcheetah dynamics_model=gaussian_mlp_ensemble +overrides.model_loss_type=va experiment=halfcheetah_va_l1 use_wandb=1 wandb_group_name=mbpo_mle_true_l1_0.01

Additional hyper-parameters for value-aware losses

dynamics_model.va_loss_coeff (default 0.01): Sets the scaling coefficient for value-aware model learning losses.
overrides.value_update_interval (default 5): Sets the interval for number of model updates in between value function refitting in the model learning loop of training. Setting this to 0 deactivates value network refitting within model learning.
overrides.num_v_updates_in_model (default 5): Hyper-parameter for value-refitting. Sets the number of value updates during refitting.
overrides.v_update_batch_size (default 5): Hyper-parameter for value-refitting. Sets the batch size per update during value network refitting.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github		.github
docs		docs
mbrl		mbrl
notebooks		notebooks
requirements		requirements
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
MBRL-LIB-README.md		MBRL-LIB-README.md
README.md		README.md
pyproyect.toml		pyproyect.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model-Advantage and Value-Aware Models for Model-Based Reinforcement Learning: Bridging the Gap in Theory and Practice

Install

Run

Additional hyper-parameters for value-aware losses

About

Releases

Packages

Languages

License

nirbhayjm/va_mbpo

Folders and files

Latest commit

History

Repository files navigation

Model-Advantage and Value-Aware Models for Model-Based Reinforcement Learning: Bridging the Gap in Theory and Practice

Install

Run

Additional hyper-parameters for value-aware losses

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages