Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Clarification on Multimodal Model Quantization and Default Calibration Dataset" #714

Open
donghong1 opened this issue Feb 17, 2025 · 1 comment

Comments

@donghong1
Copy link

Hello,

I have a few questions regarding the quantization of multimodal models:

1、Does the current version of AutoAWQ quantize only the language model, or does it also include the vision component for quantization?
2、What is the default calibration dataset used for quantization?
3、I noticed that the example code for Qwen2-VL uses a custom multimodal dataset. Is this dataset required for all multimodal model quantizations, or can we use the default dataset?
Thank you for your clarification!

@seungwoos
Copy link

Hi, @donghong1

  1. As far as I know, AWQ itself quantizes only the language model, not vision encoders. I saw the paper proposed quantizing vision encoders but not sure which paper is.
  2. It seems the authors uses pile-val-backup dataset in there paper.
  3. I am not 100% sure but I guess we should multimodal dataset for vlms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants