Scribble-Guided Diffusion for
Training-free Text-to-Image Generation

Seonho Lee^, Jiho Choi^, Seohyun Lim, Jiwook Kim, Hyunjung Shim

(*Equal contribution)

This is the official implementation of Scribble-Guided-Diffusion.

Abstract

Recent advancements in text-to-image diffusion models have demonstrated remarkable success, yet they often struggle to fully capture the user's intent. Existing approaches using textual inputs combined with bounding boxes or region masks fall short in providing precise spatial guidance, often leading to misaligned or unintended object orientation. To address these limitations, we propose Scribble-Guided Diffusion (ScribbleDiff), a training-free approach that utilizes simple user-provided scribbles as visual prompts to guide image generation. However, incorporating scribbles into diffusion models presents challenges due to their sparse and thin nature, making it difficult to ensure accurate orientation alignment. To overcome these challenges, we introduce moment alignment and scribble propagation, which allow for more effective and flexible alignment between generated images and scribble inputs. Experimental results on the PASCAL-Scribble dataset demonstrate significant improvements in spatial control and consistency, showcasing the effectiveness of scribble-based guidance in diffusion models. Please check the paper here: Scribble-Guided Diffusion for Training-free Text-to-Image Generation

News & Updates

[TBA] ✨ User-friendly scribble drawing tool will be released soon.
[TBA] ✨ Huggingface-based code will be released soon.
[2024/09/13] 🌟 LDM-based code was released.

Architecture

Setup

First, create and activate a new conda environment:

conda create --name highlight-guided python==3.8.0
conda activate highlight-guided
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

Next, install the necessary dependencies:

pip install -r environments/requirements_all.txt
# if this does not work, try the following
pip install -r environments/requirements.txt

Install additional libraries:

pip install git+https://github.com/CompVis/taming-transformers.git
pip install git+https://github.com/openai/CLIP.git

Download the model GLIGEN trained with box-grounding tokens with text and put them in checkpoints/gligen

Inference

To create scribbles for guidance:

python draw_scribble.py

※ We will explain how to draw and save scribbles in the future.

After drawing the scribbles, save the images in the */strokes directory, for example:

examples/example1/strokes

Ensure the directory structure matches the configuration file paths. For instance, in configs/config.json:

For config.json

"stroke_dir": "examples/example1/strokes",
"save_scribble_dir": "examples/example1/scribbles",
"save_mask_dir": "examples/example1/masks",

To run with user input text prompts:

python inference.py --ckpt checkpoints/gligen/text-box/diffusion_pytorch_model.bin

To use the default configuration file:

python inference.py --config configs/config.json

※ We will provide a more user-friendly and intuitive scribble drawing tool in the future.

Acknowledgments

This project is built on the following resources:

Attention Refocusing: This is the baseline model we used in our paper.
GLIGEN: Our code is built upon the foundational work provided by GLIGEN.

Related Works

BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion

Dense Text-to-Image Generation with Attention Modulation

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
checkpoints/stable_diffusion		checkpoints/stable_diffusion
configs		configs
environments		environments
examples/example1		examples/example1
grounding_input		grounding_input
ldm		ldm
losses		losses
outputs/sgdiff		outputs/sgdiff
utils		utils
.gitignore		.gitignore
README.md		README.md
args.py		args.py
data.py		data.py
draw_highlight.py		draw_highlight.py
inference.py		inference.py
run.py		run.py
scribble_propagation.py		scribble_propagation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scribble-Guided Diffusion for
Training-free Text-to-Image Generation

Seonho Lee^, Jiho Choi^, Seohyun Lim, Jiwook Kim, Hyunjung Shim

(*Equal contribution)

This is the official implementation of Scribble-Guided-Diffusion.

Abstract

News & Updates

Architecture

Setup

Inference

Acknowledgments

Related Works

About

Releases

Packages

Contributors 2

Languages

kaist-cvml/scribble-guided-diffusion

Folders and files

Latest commit

History

Repository files navigation

Scribble-Guided Diffusion forTraining-free Text-to-Image Generation

Seonho Lee*, Jiho Choi*, Seohyun Lim, Jiwook Kim, Hyunjung Shim

(*Equal contribution)

This is the official implementation of Scribble-Guided-Diffusion.

Abstract

News & Updates

Architecture

Setup

Inference

Acknowledgments

Related Works

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Scribble-Guided Diffusion for
Training-free Text-to-Image Generation

Seonho Lee^, Jiho Choi^, Seohyun Lim, Jiwook Kim, Hyunjung Shim

Packages