The codes for the Centralized Reward Agent (CenRA) framework of MTRL.
CenRA consists of two components: one centralized reward agent (CRA) and multiple distributed policy agents for their corresponding tasks. The CRA is responsible for learning a reward model to share and transfer task-relevant knowledge to the policy agents.
- The code is only supported for Python 3.6 to 3.10. (Due to the PyBullet rendering package, the code is not supported for Python higher than 3.11.)
- This code has been tested on:
pytorch==2.0.1+cu117
- Install all dependent packages:
pip3 install -r requirements.txt
- For the MujocoCar environment, refer to this instruction for detailed installation.
Run the following command to train CenRA on different environments specified by <Environment>
:
python run-<Environment>.py
All available environments with corresponding <Environment>
are listed below:
- 2DMaze environment:
2dmaze
, running script. - 3DPickup environment:
3dpickup
, running script. - MujocoCar environment:
mujococar
, running script.
All hyper-parameters are set as default values in the code. You can change them by adding arguments to the command line. Some selected available arguments are listed below, for the full list, please refer to the running scripts run-<Environment>.py
.
--exp-name: the name of the experiment, to record the tensorboard and save the model.
--suggested-reward-scale: the scale of the knowledge reward, default is 1.
--lamb: the weight of the knowledge reward, default is 0.5.
--total-timesteps: the total timesteps to train the agent.
--pa-learning-starts: the burn-in steps of the distributed policy agent.
--ra-learning-starts: the burn-in steps of the centralized reward agent.
--pa-buffer-size: the buffer size of the policy agent.
--pa-batch-size: the batch size of the policy agent
--ra-batch-size: the batch size of the reward agent
The comparison of CenRA with several baselines, including the backbone algorithms DQN (Mnih et al. 2015) for discrete control and SAC (Haarnojaet al. 2018) for continuous control, ReLara (Ma et al. 2024), PiCor (Bai et al. 2023) and MCAL (Mysore et al. 2022).