Hui Yuan ,
Kaixuan Huang ,
Chengzhuo Ni ,
Minshuo Chen ,
Mengdi Wang
Princeton University
NeurIPS 2023
This repo contains the codes for replicating the experiments in our paper.
pip install -r requirements.txt
- Randomly generate a ground-truth reward model and the reward labels for CIFAR10 dataset.
python3 fake_dataset.py
The ground-truth reward model (a ResNet18 model with the final layer replaced by a randomly initialized linear layer) is saved at reward_model.pth
, and the reward labels are saved at cifar10_outputs_with_noise.npy
.
- Train a 3-layer ConvNet (on top of the frozen StableDiffusion v1.5 VAE embedding space) to predict the rewards
python3 train.py
The default config is lr = 0.001, num_data = 50000, num_epochs = 100
and can be modified in train.py
.
- Perform Reward-Directed Conditional Diffusion using
python3 inference.py --target 1 --guidance 100 --num_images 100
The following term will be added to each step of the diffusion model.
If you find this useful in your research, please consider citing our paper.
@article{yuan2024reward,
title={Reward-directed conditional diffusion: Provable distribution estimation and reward improvement},
author={Yuan, Hui and Huang, Kaixuan and Ni, Chengzhuo and Chen, Minshuo and Wang, Mengdi},
journal={Advances in Neural Information Processing Systems},
volume={36},
year={2024}
}