Skip to content

Latest commit

 

History

History
110 lines (70 loc) · 3.65 KB

README.md

File metadata and controls

110 lines (70 loc) · 3.65 KB

FG-DM

Adapting Diffusion Models for Improved Prompt Compliance and Controllable Image Synthesis
Deepak Sridhar, Abhishek Peri, Rohith Rachala, Nuno Vasconcelos
NeurIPS '24 | GitHub | arXiv | Project page

fg-dm

Cloning

Use --recursive to also clone the segmentation editor app

git clone --recursive https://github.com/DeepakSridhar/fgdm.git

Requirements

A suitable conda environment named ldm can be created and activated with:

conda env create -f fgdm.yaml
conda activate ldm

Dataset

We used COCO17 dataset for training FG-DMs.

  1. You can download the COCO 2017 dataset from the official COCO Dataset Website. Download the following components: Annotations: Includes caption and instance annotations. Images: Includes train2017, val2017, and test2017.
  2. Extract Files Extract all downloaded files into the /data/coco directory or to your desired location. Place the annotation files in the annotations/ folder. Place the image folders in the images/ folder.
  3. Verify the Directory Structure Ensure that your directory structure matches as outlined below.

coco/

|---- annotations/

|------- captions_train2017.json

|------- captions_val2017.json

|------- instances_train2017.json

|------- instances_val2017.json

|------- train2017/

|------- val2017/

|---- images/

|------- train2017/

|------- val2017/

FG-DM Pretrained Weights

The segmentation FGDM weights are available on Google Drive Place them under models directory

Inference: Text-to-Image with FG-DM

bash run_inference.sh

Training: FG-DM Seg from scratch

  • We used sdv1.4 weights for training FG-DM conditions but sdv1.5 is also compatible:

  • The original SD weights are available via the CompVis organization at Hugging Face. The license terms are identical to the original weights.

  • sd-v1-4.ckpt: Resumed from sd-v1-2.ckpt. 225k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.

  • Download the condition weights from ControlNet and place them in the models folder to train depth and normal FG-DMs.

  • Alternatively download all these models by running download_models.sh file under scripts directory.

python main.py --base configs/stable-diffusion/nautilus_coco_adapter_semantic_map_gt_captions_distill_loss.yaml -t --gpus 0,

Acknowledgements

Our codebase for the diffusion models builds heavily on LDM codebase and ControlNet.

Thanks for open-sourcing!

BibTeX

@inproceedings{neuripssridhar24,
      author = {Sridhar, Deepak and Peri, Abhishek and Rachala, Rohit and Vasconcelos, Nuno},
      title = {Adapting Diffusion Models for Improved Prompt Compliance   and Controllable Image Synthesis},
      booktitle = {Neural Information Processing Systems},
      year = {2024},
  }