Want to figure out what a good prompt might be to create new images like an existing one? The CLIP Interrogator is here to get you answers!
Run Version 2 on Colab, HuggingFace, and Replicate!
Version 1 still available in Colab for comparing different CLIP models
The CLIP Interrogator is a prompt engineering tool that combines OpenAI's CLIP and Salesforce's BLIP to optimize text prompts to match a given image. Use the resulting prompts with text-to-image models like Stable Diffusion on DreamStudio to create cool art!
Create and activate a Python virtual environment
python3 -m venv ci_env
(for linux ) source ci_env/bin/activate
(for windows) .\ci_env\Scripts\activate
Install with PIP
# install torch with GPU support for example:
pip3 install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu117
# install clip-interrogator
pip install clip-interrogator==0.5.1
You can then use it in your script
from PIL import Image
from clip_interrogator import Config, Interrogator
image = Image.open(image_path).convert('RGB')
ci = Interrogator(Config(clip_model_name="ViT-L-14/openai"))
print(ci.interrogate(image))
CLIP Interrogator uses OpenCLIP which supports many different pretrained CLIP models. For the best prompts for
Stable Diffusion 1.X use ViT-L-14/openai
for clip_model_name. For Stable Diffusion 2.0 use ViT-H-14/laion2b_s32b_b79k
The Config
object lets you configure CLIP Interrogator's processing.
clip_model_name
: which of the OpenCLIP pretrained CLIP models to usecache_path
: path where to save precomputed text embeddingsdownload_cache
: when True will download the precomputed embeddings from huggingfacechunk_size
: batch size for CLIP, use smaller for lower VRAMquiet
: when True no progress bars or text output will be displayed
See the run_cli.py and run_gradio.py for more examples on using Config and Interrogator classes.