-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/inversion pp #104
Feat/inversion pp #104
Conversation
Bit more commits coming... |
Will start to look at this today. Just to clarify, the target_imgs input:
should be a list of images matching the example prompts. I'm guessing these could be generated by the same pipe if necessary? |
Target images are the reference images, so they are like ground truth images |
Right, got it, but I imagine when we actually use this, we won't have ground truth images for all the prompts, so we can generate "ground truth" using the pipe (probably better to use the original model, rather than the trained). |
So these seemed to work very well, I'll add these with updated example runfile + example dataset. |
Oh so I was thinking, we have target subject X to train, testing on prompt Y and see how well it creates image : generated image Z should be faithful to prompt Y, and to subject X. that measure : sim(Z,Y), sim(Z,X) is what we are trying to get here X : So our only source (ground truth) images are X, since Y are text, and Z is generated with SD |
I'll merge this I guess |
Ok this makes sense now. Thanks! |
Amazing! But you can't just drop the image without telling us what tricks you used? And what is high norm prior??? |
In this PR I made 5 changes to get it work :
|
Thanks for the secret sauce. Very clever the multivector initialization. Does that mean your prompts include all the tokens together? |
This has gotten bigger than last time I looked :). I haven't had time to understand all the changes, but the results speak for themselves. Great work! |
They have |
Few utilities, including CLIP evaluation, CLIP evaluation preparation, random initialization with sigma.
So I've implemented CLIP text alignment, Image alignment in this PR #67 (comment)
Expect to see some results like above figure, from Custom Diffusion.
https://arxiv.org/abs/2212.04488