Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply applicable quantization_config to model components when loading a model #10327

Open
vladmandic opened this issue Dec 20, 2024 · 11 comments
Assignees

Comments

@vladmandic
Copy link
Contributor

vladmandic commented Dec 20, 2024

With new improvements to quantization_config, memory requirements of models such as SD35 and FLUX.1 are much lower.
However, user must load each model component that he wants quantized manually and then assemble the pipeline.

For example:

quantization_config = BitsAndBytesConfig(...)
transformer = SD3Transformer2DModel.from_pretrained(repo_id, subfolder="transformer", quantization_config=quantization_config)
text_encoder = T5EncoderModel.from_pretrained(repo_id, subfolder="text_encoder_3", quantization_config=quantization_config)
pipe = StableDiffusion3Pipeline.from_pretrained(repo_id, transformer=transformer, text_encoder=text_encoder)

The ask is to allow pipeline loader itself to process quantization_config and automatically use it on applicable modules if its present
That would allow much simpler use without user needing to know exact internal components of the each model:

quantization_config = BitsAndBytesConfig(...)
pipe = StableDiffusion3Pipeline.from_pretrained(repo_id, quantization_config=quantization_config)

This is a generic ask that should work for pretty much all models, although primary use case is with the most popular models such as SD35 and FLUX.1

@yiyixuxu @sayakpaul @DN6 @asomoza

@sayakpaul
Copy link
Member

Yeah this is planned. I thought we had created an issue for it to track, but clearly, it had slipped through the cracks.

We should also have something like exclude_modules to let the users specify the names of the models to not quantize (typically the CLIP text encoder, VAE, or any model that doesn't have too many linear layers to benefit from the classic quantization techniques).

@vladmandic
Copy link
Contributor Author

We should also have something like exclude_modules to let the users specify the names of the models to not quant

Yup! And it can have a default value with exactly the ones you've mentioned.

Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Jan 20, 2025
@vladmandic
Copy link
Contributor Author

ping to remove stale

@github-actions github-actions bot removed the stale Issues that haven't received updates label Jan 21, 2025
@vladmandic
Copy link
Contributor Author

vladmandic commented Feb 12, 2025

any updates on this one?
i just added lumina2 supportand quantization works.
but instead of simply using Lumina2Text2ImgPipeline.from_pretrained (or even autopipeline), need to manually pre-load transformer using Lumina2Transformer2DModel and text-encoder using transformers.AutoModel to assemble the model.

@SunMarc
Copy link
Member

SunMarc commented Feb 12, 2025

@sayakpaul is planning on adding this soon. Sorry for the delay

Copy link
Contributor

github-actions bot commented Mar 9, 2025

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Mar 9, 2025
@vladmandic
Copy link
Contributor Author

ping to remove stale

@sayakpaul
Copy link
Member

Definitely not stale. Will be prioritised soon.

@sayakpaul sayakpaul removed the stale Issues that haven't received updates label Mar 10, 2025
@sayakpaul
Copy link
Member

@vladmandic

from diffusers.quantizers import PipelineQuantizationConfig
from diffusers import DiffusionPipeline
import torch

quant_config = PipelineQuantizationConfig(
    quant_backend="bitsandbytes_4bit",
    quant_kwargs={
        "load_in_4bit": True,
        "bnb_4bit_quant_type": "nf4",
        "bnb_4bit_compute_dtype": torch.bfloat16
    },
    exclude_modules=["text_encoder", "vae"]
)
pipe = DiffusionPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    quantization_config=quant_config,
    torch_dtype=torch.bfloat16
).to("cuda")

https://github.com/huggingface/diffusers/compare/feat/pipeline-quant-config?expand=1

@sayakpaul
Copy link
Member

@vladmandic since you reacted to the above message, do feel free to provide feedback here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants