GitHub - zer0int/CLIP-gradient-ascent-embeddings: Use CLIP to create matching texts + embeddings for given images; useful for XAI, adversarial training

🚀🆙 CLIP-gradient-ascent-embeddings

❗ Requires OpenAI/CLIP
Generates matching text embeddings / a 'CLIP opinion' about images
Uses gradient ascent to optimize text embeds for cosine similarity with image embeds
Saves 'CLIP opinion' as .txt files [best tokens]
Saves text-embeds.pt with [batch_size] number of embeds
Can be used to create an adversarial text-image aligned dataset
For XAI, adversarial training, etc; see 'attack' folder for example images
Usage: Single image: python gradient-ascent.py --use_image attack/024_attack.png
Usage: Batch process: python gradient-ascent.py --img_folder attack
🆕 Load custom model: python gradient-ascent-unproj_flux1.py --model_name "path/to/myCLIP.safetensors"

Changes 07/DEC/2024

Args --model_name now accepts name (default ViT-L/14), OR a "/path/to/model.pt"
If it ends on .safetensors, will assume 'ViT-L/14' (CLIP-L) and load state_dict. ✅
⚠️ Must be nevertheless in original "OpenAI/CLIP" format. HuggingFace converted models will NOT work.
My HF: zer0int/CLIP-GmP-ViT-L-14 model.safetensors will NOT work (it's for diffusers / HF).
Instead, download the full model .safetensors [text encoder AND vision encoder]; direkt link:
My GmP-BEST-smooth and GmP-Text-detail and 🆕 SAE-GmP will work with this code.

🆕 Added gradient-ascent-unproj_flux1.py. Usage is the same; however, in addition to projected embeddings:
Saves pinv and inv version of pre-projection embeddings.
👉 Flux.1-dev uses these embeddings (pinv seems best for Flux.1-dev).
Recommended samplers: HEUN, Euler.

Example "worst portrait ever" generated by Flux.1-dev with pure CLIP guidance (no T5!) as CLIP apparently tried to encode the facial expression of the cat 😂; plus, the usual CLIP text gibberish of something 'cat' and 'shoe' mashed-up:

Command-line arguments:

--batch_size, default=13, type=int, help="Reduce batch_size if you have OOM issues"
--model_name, default='ViT-L/14', help="CLIP model to use"
--tokens_to, default="texts", help="Save CLIP opinion texts path"
--embeds_to, default="embeds", help="Save CLIP embeddings path"
--use_best, default="True", help="If True, use best embeds (loss); if False, just saves last step (not recommended)"
--img_folder, default=None, help="Path to folder with images, for batch embeddings generation"
--use_image, default=None, help="Path to a single image"

Further processing example code snippets:

text_embeddings = torch.load("path/to/embeds.pt").to(device)

# loop over all batches of embeds and do a thing
num_embeddings = text_embeddings.size(0) # e.g. batch_size 13 -> idx 0 to 12
for selected_embedding_idx in range(num_embeddings):
    print(f"Processing embedding index: {selected_embedding_idx}")
    # do your thing here!


# select a random batch from embedding and do a thing
selected_embedding_idx = torch.randint(0, text_embeddings.size(0), (1,)).item()
selected_embedding = text_embeddings[selected_embedding_idx:selected_embedding_idx + 1]

# or just manually select one
selected_embedding_idx = 3
selected_embedding = text_embeddings[selected_embedding_idx:selected_embedding_idx + 1]

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
attack		attack
README.md		README.md
gradient-ascent-unproj_flux1.py		gradient-ascent-unproj_flux1.py
gradient-ascent.py		gradient-ascent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀🆙 CLIP-gradient-ascent-embeddings

Changes 07/DEC/2024

About

Uh oh!

Releases

Packages

Languages

zer0int/CLIP-gradient-ascent-embeddings

Folders and files

Latest commit

History

Repository files navigation

🚀🆙 CLIP-gradient-ascent-embeddings

Changes 07/DEC/2024

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages