Reward-Directed Conditional Diffusion:
Provable Distribution Estimation and Reward Improvement

Hui Yuan , Kaixuan Huang , Chengzhuo Ni , Minshuo Chen , Mengdi Wang
Princeton University

NeurIPS 2023

This repo contains the codes for replicating the experiments in our paper.

Requirements

pip install -r requirements.txt

Usage

Randomly generate a ground-truth reward model and the reward labels for CIFAR10 dataset.

python3 fake_dataset.py

The ground-truth reward model (a ResNet18 model with the final layer replaced by a randomly initialized linear layer) is saved at reward_model.pth, and the reward labels are saved at cifar10_outputs_with_noise.npy.

Train a 3-layer ConvNet (on top of the frozen StableDiffusion v1.5 VAE embedding space) to predict the rewards

python3 train.py

The default config is lr = 0.001, num_data = 50000, num_epochs = 100 and can be modified in train.py.

Perform Reward-Directed Conditional Diffusion using

python3 inference.py --target 1 --guidance 100 --num_images 100

The following term will be added to each step of the diffusion model. $$\nabla_x \log p_t(y|x) = - \text{guidance} \cdot \nabla_x \Big[ \frac12 | \text{target}-\mu_\theta(x)|_2^2 \Big].$$

Citation

If you find this useful in your research, please consider citing our paper.

@article{yuan2024reward,
  title={Reward-directed conditional diffusion: Provable distribution estimation and reward improvement},
  author={Yuan, Hui and Huang, Kaixuan and Ni, Chengzhuo and Chen, Minshuo and Wang, Mengdi},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
asset		asset
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
fake_dataset.py		fake_dataset.py
inference.py		inference.py
model.py		model.py
requirements.txt		requirements.txt
sd_pipeline.py		sd_pipeline.py
train.py		train.py
vae.py		vae.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reward-Directed Conditional Diffusion:
Provable Distribution Estimation and Reward Improvement

Requirements

Usage

Citation

About

Releases

Packages

Languages

License

Kaffaljidhmah2/RCGDM

Folders and files

Latest commit

History

Repository files navigation

Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement

Requirements

Usage

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Reward-Directed Conditional Diffusion:
Provable Distribution Estimation and Reward Improvement

Packages