Skip to content
/ CenRA Public

The Centralized Reward Agent (CenRA) based multi-task reinforcement learning framework.

Notifications You must be signed in to change notification settings

mahaozhe/CenRA

Repository files navigation

Knowledge Sharing and Transfer via Centralized Reward Agent for Multi-Task Reinforcement Learning

The codes for the Centralized Reward Agent (CenRA) framework of MTRL.

[Paper link]

CenRA consists of two components: one centralized reward agent (CRA) and multiple distributed policy agents for their corresponding tasks. The CRA is responsible for learning a reward model to share and transfer task-relevant knowledge to the policy agents.

Framework of CenRA for MTRL

Requirements

  • The code is only supported for Python 3.6 to 3.10. (Due to the PyBullet rendering package, the code is not supported for Python higher than 3.11.)
  • This code has been tested on:
pytorch==2.0.1+cu117
  • Install all dependent packages:
pip3 install -r requirements.txt
  • For the MujocoCar environment, refer to this instruction for detailed installation.

Run CenRA Algorithm

Run the following command to train CenRA on different environments specified by <Environment>:

python run-<Environment>.py

All available environments with corresponding <Environment> are listed below:

All tasks in our experiments

All hyper-parameters are set as default values in the code. You can change them by adding arguments to the command line. Some selected available arguments are listed below, for the full list, please refer to the running scripts run-<Environment>.py.

--exp-name: the name of the experiment, to record the tensorboard and save the model.

--suggested-reward-scale: the scale of the knowledge reward, default is 1.
--lamb: the weight of the knowledge reward, default is 0.5.

--total-timesteps: the total timesteps to train the agent.
--pa-learning-starts: the burn-in steps of the distributed policy agent.
--ra-learning-starts: the burn-in steps of the centralized reward agent.

--pa-buffer-size: the buffer size of the policy agent.
--pa-batch-size: the batch size of the policy agent
--ra-batch-size: the batch size of the reward agent

Comparative Evaluation

The comparison of CenRA with several baselines, including the backbone algorithms DQN (Mnih et al. 2015) for discrete control and SAC (Haarnojaet al. 2018) for continuous control, ReLara (Ma et al. 2024), PiCor (Bai et al. 2023) and MCAL (Mysore et al. 2022).

Comparison the learning performance of CenRA with the baselines.

About

The Centralized Reward Agent (CenRA) based multi-task reinforcement learning framework.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages