"Clarification on Multimodal Model Quantization and Default Calibration Dataset" #714

donghong1 · 2025-02-17T03:21:41Z

Hello,

I have a few questions regarding the quantization of multimodal models:

1、Does the current version of AutoAWQ quantize only the language model, or does it also include the vision component for quantization?
2、What is the default calibration dataset used for quantization?
3、I noticed that the example code for Qwen2-VL uses a custom multimodal dataset. Is this dataset required for all multimodal model quantizations, or can we use the default dataset?
Thank you for your clarification!

seungwoos · 2025-02-20T04:59:11Z

Hi, @donghong1

As far as I know, AWQ itself quantizes only the language model, not vision encoders. I saw the paper proposed quantizing vision encoders but not sure which paper is.
It seems the authors uses pile-val-backup dataset in there paper.
I am not 100% sure but I guess we should multimodal dataset for vlms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Clarification on Multimodal Model Quantization and Default Calibration Dataset" #714

"Clarification on Multimodal Model Quantization and Default Calibration Dataset" #714

donghong1 commented Feb 17, 2025

seungwoos commented Feb 20, 2025

"Clarification on Multimodal Model Quantization and Default Calibration Dataset" #714

"Clarification on Multimodal Model Quantization and Default Calibration Dataset" #714

Comments

donghong1 commented Feb 17, 2025

seungwoos commented Feb 20, 2025