Closed
Description
Hi, I saw the AWQ model of gemma-2-9b-it on HF (solidrust/gemma-2-9b-it-AWQ) and it works well on vLLM. However, when I tried to quantize same model (google/gemma-2-9b-it) with the code you uploaded on github, the outputs doesn't work well. I've got this error when I validate the quantized model.
'Gemma2LikeModel' object has no attribute '_prepare_4d_causal_attention_mask_with_cache_position'
how can I solve it?
Metadata
Metadata
Assignees
Labels
No labels