-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible diffusers implement an official support on the increasing or decreasing weight of prompt with () & []? #2431
Comments
@haofanwang master could also have a check if you are interested in this topic haha. |
Do you think it might be possible with SEGA? https://huggingface.co/docs/diffusers/api/pipelines/semantic_stable_diffusion |
@sayakpaul Wow thanks a lot, let me have a try. |
You mean you wanted to use SEGA but conditioned on an input image? |
yes, i have two use cases need to implement, txt2img & img2img. |
for the img2img inference I'm currently using StableDiffusionImg2ImgPipeline |
Well, SEGA natively supports the first one i.e., text2image. For image2image, I believe you could:
As far as I know @manuelbrack might already have something regarding this. So, ccing them. |
Noted, thanks a lot! Let me have a try. |
Yes, there exists a preliminary version of Just call the |
That's already in the Long Prompt Weighting LPW community pipe, isn't it? That's the one I primarily use since it supports txt2img, img2img, and inpainting all in one. Is there any disadvantages to it verses the standard pipeline or the semantic proposal? I was kinda surprised lpw is still in the example community scripts rather than the diffusers collection since it seemed like the most practical one. |
@Skquark I have tried Long Prompt Weighting LPW community pipe, the result works well but it's too unstable to be used in live environment, it often stuck when I call the pipeline and i need to restart the process to run it again. |
@garyhxfang It's been stable for me, I haven't noticed it getting stuck and I've been using it as my primary for months. I have made minor mods to it, but in general it's been solid and I've been searching for any downsides to it. I got it as the default pipeline in my https://stablediffusiondeluxe.com implementation, and WAS also uses it as primary in his Easy Diffusion. If anyone knows any disadvantages compared to the standard I'd like to know. |
Hi, @sayakpaul , I tried SEGA yesterday, but it seems much slower that the StableDiffusionPipeline, it takes almost x3 processing time to generate a image with same size comparing with StableDiffusionPipeline, which make it not a good alternative to use in application. But the Long Prompt Weighting LPW community pipe have similar speed with StableDiffusionPipeline, the only problems is that this community pipeline are quite unstable And I also get very weird result with the weight config, and I have some question on the edit_weights:
In the example provided in the doc, the editing_prompt is an array with length == 4, but the edit_weights is with length == 5, why there is an extra element in the edit_weights?
|
@sayakpaul
If so, if I have any ways to call the community pipeline without requesting github? |
The way I did mine is to copy it as pipeline.py in my HuggingFace models, then while calling pretrained I set custom_pipeline="AlanB/lpw_stable_diffusion_mod" and it'll come from there instead. That might fix your issue, so long as HF works better than github in China.. |
Thanks a lot master! let me have a try. |
It should not as from what I see you're loading the pipeline from local files. |
There have been lots of issues about this now, so linking them here: As said a couple of times before we don't want to add too many high level features to @damian0815 made a very nice library that works well with I'll open a PR to add a doc page about it since it seems to be such an important feature, but I'd really like to rely on |
Opened two PRs to Compel to make them a bit more user-friendly for One can already use the library though very nicely as follows: from diffusers import StableDiffusionPipeline, DPMSolverSinglestepScheduler, DPMSolverMultistepScheduler, DEISMultistepScheduler, HeunDiscreteScheduler
from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler
import time
import os
from huggingface_hub import HfApi
from compel import Compel
import torch
import sys
from pathlib import Path
path = sys.argv[1]
api = HfApi()
start_time = time.time()
#pipe = StableDiffusionPipeline.from_pretrained(path, torch_dtype=torch.float16, device_map="auto")
#pipe.scheduler = HeunDiscreteScheduler.from_config(pipe.scheduler.config)
pipe = StableDiffusionPipeline.from_pretrained(path, torch_dtype=torch.float16)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
compel = Compel(tokenizer=pipe.tokenizer, text_encoder=pipe.text_encoder)
pipe = pipe.to("cuda")
prompt = "a highly realistic photo of green turtle"
prompts = ["a cat playing with a ball++ in the forest", "a cat playing with a ball in the forest", "a cat playing with a ball-- in the forest"]
prompt_embeds = torch.cat([compel.build_conditioning_tensor(prompt) for prompt in prompts])
generator = [torch.Generator(device="cuda").manual_seed(0) for _ in range(prompt_embeds.shape[0])]
images = pipe(prompt_embeds=prompt_embeds, generator=generator, num_inference_steps=15).images Note the |
PR with docs opened here: #2574 |
@patrickvonplaten The concern I have is that by using a third party library like Compel it's sort of pushing the problem further down the line. Right now, diffusers recommends using a community pipeline maintained by a third-party / the community. Now diffusers is switching it for a library maintained by another third-party. It's essentially the same approach, but with a library instead of pipeline. I'd argue this still poses an issue because of the instability of this approach. Let me elaborate below. The problem the community pipeline faced is that it became unstable because it wasn't properly maintained / improved by the third-party (or community in this case). It's possible that Compel will run into this exact same problem. Right now, it seems Damian is maintaining the library mostly by themselves. If Damian gets busy or if they decide to move on then the library will go unsupported. I do know Damian is a contributor to InvokeAI and that Invoke recently added references to Damian's library in their code. So it seems unlikely that Invoke will just let the library die given their dependency. However, the risk is still there. Damian owns the repo for Compel and if Damian stops monitoring the repo, then the library dies and Invoke will have to move off of it (albeit maybe there's another owner of the repo that I missed). As such, there's a bit of an inherent risk here by relying on this library. I think the solution to this problem is the following: Right now, we already have the long prompt weighting (LPW) community pipeline. But it's not maintained. So why not just transition it to become an official pipeline? It would still be a separate pipeline from the base one. So, it wouldn't be breaking with the philosophy of diffusers by making the base one more complicated. The only difference would be that LPW has more support. I know I've made this case before in the other thread. But, the only alternative would just be to hope that Compel stays well supported, which isn't a guaranteed. Really the fundamental issue is here is lack of support and it won't be fixed until diffusers makes a support commitment to important features like this. It can make that commitment however it likes so that it doesn't break the philosophy of diffusers = toolkit. Either that or it needs to absolutely make sure that the third party tools it recommends are well supported. But regardless, there needs to be some sort of support. |
Hey @Ephil012, Thanks for voicing your concerns here, I understand where you're coming from. Besides not being in line with our philosophy, the big problem here is maintainability. We don't have the time and people to maintain higher-level use cases. If we add prompt weighting as a core functionality, we open the door to add more and more UI/UX features. Now since this is a highly requested feature, I think adding both:
We have a very high level of stability. If damian decides to stop maintaining the library, we still have a solid compel==0.18.0 version. Just adding a new pipeline is not an option because then we're closing the door for all use cases of prompt weighting for other pipelines. Instead we now have a robust system of:
I don't see a problem here for the community at all tbh - as you can see in the doc here: https://github.com/huggingface/diffusers/blob/176d85cb55d6908c003dff12ef4e2d077aafd1c7/docs/source/en/using-diffusers/weighted_prompts.mdx it's now a three-liner of code to do prompt weighting with |
Also note that |
fwiw @Ephil012 Compel also supports long prompts as of v0.1.10 (released yesterday) which i'd expect makes the LPW pipeline pretty much redundant. as the maintainer of Compel i'm closely involved with the development of InvokeAI, which uses Compel for prompt weighting, so you've got the benefit of two professional business orgs backing Compel-driven code. |
I'll comment here echoing confidence in compel. @damian0815 has done an excellent job building a flexible and streamlined prompt syntax, and I've been able to watch it develop first-hand. Invoke is building our platform to become a foundation for professional usage/development in the ecosystem, with a more sustainable codebase supported by commercial offerings - As @patrickvonplaten has noted, compel will have multiple orgs using it at this point, and I'm confident that its criticality in the ecosystem will keep it well maintained. |
FWIW, as of today we've started using We do a decent number of installs per day, with a fairly active user and developer community |
Ah okay, if a bunch of people are vouching for it then it eases my concerns. I was just a bit concerned about being dependent on a third party lib. But if a bunch of people are using it already then I think that helps ease a lot of the worries around it. |
Hi @patrickvonplaten @damian0815. Thank you for your hard work to bring weighted prompt into diffusers! [Update 7/4] I'm sorry for using the porny prompts below. I tested compel with Realistic Vision 2.0 and got inconsistent results between diffusers and A1111 webui. Specifically, using the prompt below (from civitai) I got these oversaturated, bad outputs. from compel import Compel
from diffusers import StableDiffusionPipeline, EulerAncestralDiscreteScheduler
import torch
path = "SG161222/Realistic_Vision_V2.0"
pipe = StableDiffusionPipeline.from_pretrained(path, torch_dtype=torch.float16, safety_checker=None).to("cuda")
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
compel = Compel(tokenizer=pipe.tokenizer, text_encoder=pipe.text_encoder, truncate_long_prompts=False)
prompt = "highly detailed RAW Instagram (elegant sitting full body pose)1.4 photo of beautiful mature 26 years old (French medieval period nobility)1.4 woman, (highly detailed very long beautiful wavy hair)1.3, inside monastery's corridor background, (look at viewer)1.4, (skin pores, skin imperfections)++, (beautiful moles, freckles)--, highly detailed body, highly detailed face, (realistic sun lighting)0.4, shadows, 8k high definition, insanely detailed, intricate, masterpiece, highest quality, (angular face, slightly masculine feature face)++"
negative_prompt = "(panties)++, (bra)++, (pierced belly button)++, (body piercing)++, (3d)1.6, (3d render)1.6, (3dcg)1.6, (cropped head)+, (deformed, deformed body, deformed glasses, deformed legs)1.3, bad nipples, ugly nipples, draft, drawing, duplicate, error, extra arms, extra breasts, extra calf, extra digit, extra ears, extra eyes, extra feet, extra heads, extra knee, extra legs, extra limb, extra limbs, extra shoes, extra thighs, extra limb, failure, fake, fake face, fewer digits, floating limbs, grainy, gross, gross proportions, short arm, head out of frame, illustration, image corruption, irregular, jpeg artifacts, long body, long face, long neck, long teeth, long feet, lopsided, low, low quality, low res, low resolution, low res, lowres, malformed, messy drawing, misshapen, monochrome, more than 1 left hand, more than 1 right hand, more than 2 legs, more than 2 nipples, more than 2 thighs, more than two shoes, mosaic, multiple, multiple breasts, mutated, mutation, mutilated, no color, normal quality, (out of focus)++, (out of frame)++, oversaturated, surreal, twisted, , unappealing, uncoordinated body, uneven, unnatural, unnatural body, unprofessional, weird colors, worst, worst quality, (penis, dick, penetration)1.3, (fake skin, porcelain skin)1.3, (bad feet, wrong feet)1.3, (bad hands, wrong hands)1.3, (deformed iris, deformed pupils, semi-realistic, CGI, 3d, render, sketch, cartoon, drawing, blur, anime)1.6, (:blurry background)1.6"
num_images = 4
generator = [torch.Generator(device="cuda").manual_seed(i) for i in range(num_images)]
images = pipe(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_prompt_embeds, generator=generator, num_images_per_prompt=4, num_inference_steps=20).images Meanwhile, I got expected results with the A1111 webui using the converted equivalent prompts. prompt = "highly detailed RAW Instagram (elegant sitting full body pose:1.4) photo of beautiful mature 26 years old (French medieval period nobility:1.4) woman, (highly detailed very long beautiful wavy hair:1.3), inside monastery's corridor background, (look at viewer:1.4), (skin pores, skin imperfections:1.2), (beautiful moles, freckles:0.8), highly detailed body, highly detailed face, (realistic sun lighting:0.4), shadows, 8k high definition, insanely detailed, intricate, masterpiece, highest quality, (angular face, slightly masculine feature face:1.2)"
negative_prompt = "(panties:1.2), (bra:1.2), (pierced belly button:1.2), (body piercing:1.2), (3d:1.6), (3d render:1.6), (3dcg:1.6), (cropped head), (deformed, deformed body, deformed glasses, deformed legs:1.3), bad nipples, ugly nipples, draft, drawing, duplicate, error, extra arms, extra breasts, extra calf, extra digit, extra ears, extra eyes, extra feet, extra heads, extra knee, extra legs, extra limb, extra limbs, extra shoes, extra thighs, extra limb, failure, fake, fake face, fewer digits, floating limbs, grainy, gross, gross proportions, short arm, head out of frame, illustration, image corruption, irregular, jpeg artifacts, long body, long face, long neck, long teeth, long feet, lopsided, low, low quality, low res, low resolution, low res, lowres, malformed, messy drawing, misshapen, monochrome, more than 1 left hand, more than 1 right hand, more than 2 legs, more than 2 nipples, more than 2 thighs, more than two shoes, mosaic, multiple, multiple breasts, mutated, mutation, mutilated, no color, normal quality, (out of focus:1.2), (out of frame:1.2), oversaturated, surreal, twisted, , unappealing, uncoordinated body, uneven, unnatural, unnatural body, unprofessional, weird colors, worst, worst quality, (penis, dick, penetration:1.3), (fake skin, porcelain skin:1.3), (bad feet, wrong feet:1.3), (bad hands, wrong hands:1.3), (deformed iris, deformed pupils, semi-realistic, CGI, 3d, render, sketch, cartoon, drawing, blur, anime:1.6), (:blurry background:1.6)" When I removed the weights (parentheses and coefficients) from the prompts, diffusers and A1111 webui gave similarly good results. Do you have any idea on the difference between diffusers and A1111 in case of weighted prompts? [Update April 6]: I tested with |
@duongna21 Maybe better to open an issue on compel's repository? https://github.com/damian0815/compel/issues Thanks |
@duongna21 can you try Euler-a with 50 steps, and also DDIM, and compare with diffusers? please make sure you have karras scheduling disabled (i don't know how auto111 works, i assume you can do this) |
We've very recently also added support for Karras sigmas here:
|
Will try to debug this this week |
ahhha brilliant |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
As a note on compel, Vlad's fork of Auto1111 has adopted compel as an option. What are open items in consideration here? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Still need to test some things here |
My opinion is that the community of A1111-like ai generation community has grown to the point that cannot be ignored, so as the user habbit. |
Hmm, yeah we really quite heavily want to rely on Compel here... @damian0815 do you think it could make sense to parse A1111-like syntax to Compel syntax to make it easier for the community? It shouldn't be too difficult I guess no? |
Related: #3980 |
There are prompt converter utilities out there, but (personally) I think attempting to maintain a A1111 converter is outside the scope of what Compel is intended to do. There are different functions, implementation perspectives, etc. - Not everything Compel does can be mapped to A1111 syntax, and vice versa. I'm not sure the notion of "I need to be able to replicate everything across apps 100%" is realistic or desirable. |
It's more about user habbit, cause there are million's users(who are not good developer) out there which are quite familiar with A1111 webui and it's prompting way. And since there is neither official nor thirdparty support for prompting like that, these user have no choice but bear with A1111 and left diffuser i think. |
i do not have the bandwidth to maintain a convertor from auto to compel syntax, but if someone wants to contribute one and offer to maintain it i'd have no objection to a converter being a pre-processing step built into the compel library. note however @krNeko9t it would be misleading because using diffusers+compel with |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Is your feature request related to a problem? Please describe.
The currently the AUTOMATIC1111/stable-diffusion-web-ui support to increase or decrease the weight of an prompt with () & [] which is not supported by diffusers.
(e.g. "best_quality (1girl:1.3) bow bride brown_hair closed_mouth frilled_bow frilled_hair_tubes frills (full_body:1.3) fox_ear hair_bow hair_tubes ((happy)) hood japanese_clothes kimono (long_sleeves) red_bow smile solo tabi uchikake white_kimono wide_sleeves cherry_blossoms")
I request for this feature because I found that for many models on civitai, some negative_prompt with certain weight are very very important to generate a good result. For example (worst quality:2), (low quality:2).
I tried for a long time and found it almost impossible to generate result with similar quality with the negative prompt without the increase or decrease of weight. ( it try duplicating "worst quality" for different number of times(2 times, 3times or 4 times) in my negative prompt, but they all generate result with much worse quality than (worst quality:2))
Describe alternatives you've considered
When investing for the solution , I found a community pipeline Long Prompt Weighting Stable Diffusion which supports this feature.
But after I try it, I found it quite unstable that it will often stuck for the long time when I use it for inference, which means it cannot be used in production environment
So I think a better alternative is that we can directly support in in the official StableDiffusionPipeline
Describe the solution you'd like
The example how I would like be like is describe below
Do hope that @patrickvonplaten could have a check on this request, it will be very helpful for us developers to generate better images that have the same or even better quality than the ones user generate with AUTOMATIC1111/stable-diffusion-web-ui.
The text was updated successfully, but these errors were encountered: