FG-DM

Adapting Diffusion Models for Improved Prompt Compliance and Controllable Image Synthesis
Deepak Sridhar, Abhishek Peri, Rohith Rachala, Nuno Vasconcelos
NeurIPS '24 | GitHub | arXiv | Project page

Cloning

Use --recursive to also clone the segmentation editor app

git clone --recursive https://github.com/DeepakSridhar/fgdm.git

Requirements

A suitable conda environment named ldm can be created and activated with:

conda env create -f fgdm.yaml
conda activate ldm

Dataset

We used COCO17 dataset for training FG-DMs.

You can download the COCO 2017 dataset from the official COCO Dataset Website. Download the following components: Annotations: Includes caption and instance annotations. Images: Includes train2017, val2017, and test2017.
Extract Files Extract all downloaded files into the /data/coco directory or to your desired location. Place the annotation files in the annotations/ folder. Place the image folders in the images/ folder.
Verify the Directory Structure Ensure that your directory structure matches as outlined below.

coco/

|---- annotations/

|------- captions_train2017.json

|------- captions_val2017.json

|------- instances_train2017.json

|------- instances_val2017.json

|------- train2017/

|------- val2017/

|---- images/

|------- train2017/

|------- val2017/

FG-DM Pretrained Weights

The segmentation FGDM weights are available on Google Drive Place them under models directory

Inference: Text-to-Image with FG-DM

bash run_inference.sh

Training: FG-DM Seg from scratch

We used sdv1.4 weights for training FG-DM conditions but sdv1.5 is also compatible:
The original SD weights are available via the CompVis organization at Hugging Face. The license terms are identical to the original weights.
sd-v1-4.ckpt: Resumed from sd-v1-2.ckpt. 225k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.
Download the condition weights from ControlNet and place them in the models folder to train depth and normal FG-DMs.
Alternatively download all these models by running download_models.sh file under scripts directory.

python main.py --base configs/stable-diffusion/nautilus_coco_adapter_semantic_map_gt_captions_distill_loss.yaml -t --gpus 0,

Acknowledgements

Our codebase for the diffusion models builds heavily on LDM codebase and ControlNet.

Thanks for open-sourcing!

BibTeX

@inproceedings{neuripssridhar24,
      author = {Sridhar, Deepak and Peri, Abhishek and Rachala, Rohit and Vasconcelos, Nuno},
      title = {Adapting Diffusion Models for Improved Prompt Compliance   and Controllable Image Synthesis},
      booktitle = {Neural Information Processing Systems},
      year = {2024},
  }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

FG-DM

Cloning

Requirements

Dataset

FG-DM Pretrained Weights

Inference: Text-to-Image with FG-DM

Training: FG-DM Seg from scratch

Acknowledgements

BibTeX

Files

README.md

Latest commit

History

README.md

File metadata and controls

FG-DM

Cloning

Requirements

Dataset

FG-DM Pretrained Weights

Inference: Text-to-Image with FG-DM

Training: FG-DM Seg from scratch

Acknowledgements

BibTeX