Skip to content
/ Cones Public

Unofficial implementation of the Cones (Concept Neurons in Diffusion Models for Customized Generation) Paper

License

Notifications You must be signed in to change notification settings

Gothos/Cones

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

Cones

Unofficial implementation of the Cones: Concept Neurons in Diffusion Models for Customized Generation paper.

Instructions for use:

Configure the provided colab notebook with your Dreambooth dataset. Set an appropriate threshold (activation_thresh) as per the number of steps in the cones_call, then run cones_call to generate concept neuron masks.
After concept masks have been computed, reload image pipeline back with pretrained weights (generating masks alters the weights), pass dictionaries of masks to the image pipeline in cones_inference to apply masks and generate images with the neuron masks.

AFAIK the paper does not provide any threshold values, so this is up for experimentation. I have found anything on the lower side of e-4 results in destruction of the attention layer, and results in just random noise like below:

Noise

Additional Information

Learning rate (rho) is set by default at 2e-5, which is apparently good for single subject learning.
I have not been able to reproduce the paper, albeit I have only computed masks with around 50 runs through the dataset (The researchers use 1000). A few images from the training dataset (20 prior images from the class dog, and 5 images of a dog as the concept) are below:

Class Images:

4 (1) 0 (1) 7

Concept:

0 2

Here are a few generated images:

Generated (30 runs, I think):

image image

200 Runs (thresh 5e-2):

image image

Not much of the concept was learnt.
The implementation is slow, it would be nice to have pointers on how to optimize it (This is my first paper implementation).
PRs are welcome! Currently the method runs on colab GPUs with around 8-9 GB VRAM on fp16.

To do:

1.Restore default attention weights after each cones_inference
2.Get attention weights directly instead of looping over all Unet modules to find k/v layers, to reduce time for mask computation.
3.Look into implementing algorithm A2 on the paper (faster).

About

Unofficial implementation of the Cones (Concept Neurons in Diffusion Models for Customized Generation) Paper

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published