Repo for "Constraint-Conditioned Actor-Critic for Offline Safe Reinforcement Learning" [ICLR 2025]
CCAC is an offline safe RL method that models the relationship between state-action distributions and safety constraints in offline datasets. It leverages this relationship to regularize both critic and policy learning, enabling zero-shot adaptation to varying constraint thresholds. These thresholds can differ across rollouts or change dynamically over time during deployment.
To install the packages, please first create a python environment with python==3.8, then run:
cd OSRL
pip install -e .
cd ../DSRL
pip install -e .
To train a CCAC agent, simply run:
cd OSRL/examples/train
python train_ccac.py --task <env_name> --param1 <args1> ...
By default, the config file and the logs during training will be written to logs\
folder and the training plots can be viewed online using Wandb.
The default parameters can be found in OSRL/examples/configs/ccac_configs.py
.
The pre-trained models are available here. To evaluate a trained CCAC agent, simply run:
cd OSRL/examples/eval
python eval_ccac.py --path <path_to_model> --eval_episodes <number_of_episodes> --target_costs <list_of_target_cost_thresholds>
It will load config file from path_to_model/config.yaml
and model file from path_to_model/checkpoints/model.pt
, run the number of episodes for each target cost threshold, and print the average normalized reward and cost.
If you find our code and paper can help, please consider citing our paper as:
@inproceedings{guoconstraint,
title={Constraint-Conditioned Actor-Critic for Offline Safe Reinforcement Learning},
author={Guo, Zijian and Zhou, Weichao and Wang, Shengao and Li, Wenchao},
booktitle={The Thirteenth International Conference on Learning Representations}
}