-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding out-of-the-box support for multilingual models like BAAI/AltDiffusion to diffusers #2135
Comments
from diffusers import AltDiffusionPipeline, DPMSolverMultistepScheduler
import torch
pipe = AltDiffusionPipeline.from_pretrained("BAAI/AltDiffusion", torch_dtype=torch.float16, revision="fp16")
pipe = pipe.to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
prompt = "黑暗精灵公主,非常详细,幻想,非常详细,数字绘画,概念艺术,敏锐的焦点,插图"
# or in English:
# prompt = "dark elf princess, highly detailed, d & d, fantasy, highly detailed, digital painting, trending on artstation, concept art, sharp focus, illustration, art by artgerm and greg rutkowski and fuji choko and viktoria gavrilenko and hoang lap"
image = pipe(prompt, num_inference_steps=25).images[0]
image.save("./alt.png") If that didn't work, please feel free to describe a bit more what error was produced. |
Would it be an option for you guys to add support for these models to the Almost all SD diffusers code I've seen uses the It's also more user friendly IMO to have one unified |
Hey @jslegers, Pipelines are not supposed to be "all-in-one" toolboxes, but rather examples of how to run a certain model. For more in-detail information you can have a look at the docs here: |
Pipelines are just examples? * confused *
If the only difference between Stable Diffusion & AltDiffusion is a different text encoder & tokenizer, they're pretty much the same thing from an end user perspective, as changing every Stable Diffusion model into an AltDiffusion model (or vice versa) is trivial and can be achieved with a few tweaks to JSON files & drag-and-dropping some files... Or am I missing something? |
I think you can use the generic from diffusers import DiffusionPipeline
#model_id can be either a Stable Diffusion or AltDiffusion model
pipe = DiffusionPipeline.from_pretrained(model_id) However, I gotta say I do understand and resonate with your desire of having a super powerful all-in-one pipeline that loads multiple models and contains all the latest features (incl. getting rid of the 77 token limits and maybe more), however I think maintaining such a pipeline - also making sure all features are inter-compatible with one another - may be beyond the scope of what the library maintainers aim for the library as you can see in the pipeline philosophy docs @patrickvonplaten has shared above. However if you (or anyone in the community!) would be interested in maintaining an all-in-one interoperable pipeline with as many features possible - one can feel free to add it as a community pipeline - community pipelines are maintained by the community and loadable from the library. The community pipeline |
That's definitely acceptable for personal use... as in code I write myself for eg. demos or prototypes. But it's not a suitable approach for when I create a custom diffusion model (customized through Dreambooth, LoRa and/or merging checkpoints) that I want others to use. When you release something to broader public, you want it to be as simple / dummy-proof as possible. For those with little to no experience with AI, learning how to use Stable Diffusion can already be a steep learning curve. Having to learn to code in Python, use Google Colab & become familiar with the With so much demo code & tutorial out out there using
Fair point. Having worked in R&D on a high-end JavaScript GIS library with a similar philosophy in the past, for a former employer, I very much appreciate and respect that you want to avoid over-engineering this library and turn it into yet another convoluted monolith that tries to do everything poorly. I appreciate that you want to keep things simple and that you want to empower the user of the library rather than give too much power to the library itself. A good compromise, in my very humble opinion, would be to optimize error reporting and inform the user when they're using the wrong pipeline and which pipeline they should be using instead. So, for example, if they're using a This way, users at least know what they're doing wrong and how to fix it without having to waste lots of precious time wading through Github issues or posts on the Huggingface community. Basically, what I'm saying is that errors associated with trying to load the wrong kind of diffusion model into a particular pipeline should be dummy-proof enough for the error message to be all the documentation needed for the users to figure out how to fix their mistake.
I'll consider it if and when I have the time for it. Right now, creating my own custom Stable Diffusion Models, testing my models, creating demo apps for them, marketing them, experimenting with chatGPT and its API, and trying to come up with a business model that allows me to monetize my experience with all this shit is already more than full-time job. I'm already biting way more than I can chew and struggling to prioritize my ongoing activities, all while still learning to find my way in what was a completely unknown ecosystem to me just a few months ago. Also, I'm not convinced adding an additional community pipeline alongside existing pipelines, that supports all of the models supported by every other pipeline, is the way to go. IMO this would result in nothing but unnecessary redundancy & bloat. Based on my understanding of the current
Additionally, ...
But hey... |
Very much agree with all of your 4 points here:
Very good point! We could/should think about a nice system here. Note that when loading with We could nevertheless make sure better error messages are thrown when something like: from diffusers import StableDiffusionInpaintPipeline
pipeline = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") which is probably probs not giving a good error message
Models and Pipeline are a bit on different levels as models are sub-components of pipelines. See here for more explanation:
Yes they abstract from the DiffusionPipeline so they have to
100% agree - we should probs change all docs to just always use |
Think 1. and 4. are quite actionable, in case anybody wants to pick them up to open PRs |
Hi, new contributor here. I've investigated 1. a bit and found that if we have something like
this doesn't actually throw an error: it succeeds and However, if we try to generate an image from this pipeline via
we get an error on the last line:
which makes sense, because the inpainting pipeline expects the initial layer of the UNet to have channels for the mask and masked image as well as the latent base image, whereas the base stable diffusion model checkpoint Perhaps I'm not familiar enough with the code, but it's unclear to me how we could generically catch errors of this sort from inside As for the error message itself, perhaps adding something like "please verify that the model and pipeline are consistent" might be helpful? (I'm also working on 4. but haven't made enough progress to submit a PR.) |
@dg845 could you open a new issue? Happy to answer there :-) |
I have opened a new issue at #2799. |
Since the merging of #2799 and #2809 is there anything else left to do here or should this be closed @patrickvonplaten? |
Good call think we can close this one no @jslegers ? |
Not just yet. I'm still struggling with the limit of 77 tokens, which is way too small for many of my prompts. Is there a way to combine the I'd like to keep the image generation capabilities of whichever SDXL model I'm using, but add better prompt support. More in particular, I'm looking for a way to overcome the limit of 77 tokens without just ignoring most of the prompt (See my initial post). I vaguely remember trying to just swap text encoders with in the past, with it producing errors. See also #8977 (comment) for a comment where I just referenced this one. |
Ah, nevermind, I just saw this. I'm going to give that approach a try. I'll close this ticket myself. |
Description of the problem
I'm unable to load multilingual models like BAAI/AltDiffusion with
diffusers
.The solution I'd like
I would like
diffusers
to have out-of-the-box support for models like BAAI/AltDiffusion, that use AltCLIP or a different multilingual CLIP model.Alternatives I've considered
I tried loading BAAI/AltDiffusion, but it told me that I was missing a library.
After loading that library, it produced a different error.
I kinda gave up after that.
Additional context
Since AltCLIP has a
max_position_embeddings
value of 514 for its text encoder instead of 77, I had hoped I could just replace the text encoder and tokenizer of my models with those of BAAI/AltDiffusion to overcome the 77 token limit.The text was updated successfully, but these errors were encountered: