Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Unable to run meta-llama/Llama-Guard-3-8B-INT8 #6756

Closed
xfalcox opened this issue Jul 24, 2024 · 5 comments · Fixed by #7445
Closed

[Bug]: Unable to run meta-llama/Llama-Guard-3-8B-INT8 #6756

xfalcox opened this issue Jul 24, 2024 · 5 comments · Fixed by #7445
Labels
bug Something isn't working

Comments

@xfalcox
Copy link

xfalcox commented Jul 24, 2024

Your current environment

Latest Docker image, RTX 4090

🐛 Describe the bug

docker run --gpus all vllm/vllm-openai:latest --model meta-llama/Llama-Guard-3-8B-INT8
...
[rank0]:     raise ValueError(f"Cannot find any of {keys} in the model's "
[rank0]: ValueError: Cannot find any of ['adapter_name_or_path'] in the model's quantization config.
@xfalcox xfalcox added the bug Something isn't working label Jul 24, 2024
@mgoin
Copy link
Collaborator

mgoin commented Jul 24, 2024

@thesues @chenqianfzh It looks like this is an 8bit BNB model. Would it be easy to add support for these checkpoints as well?

@chenqianfzh
Copy link
Contributor

@thesues @chenqianfzh It looks like this is an 8bit BNB model. Would it be easy to add support for these checkpoints as well?

It won't be difficult. I will work on it with higher priority.

@meihui
Copy link

meihui commented Aug 6, 2024

seems version 0.5.4+cu124 is working with bnb 4bit model.

but it says

WARNING 08-06 06:27:07 config.py:254] bitsandbytes quantization is not fully optimized yet. The speed can be slower than non-quantized models.

Will that be a easy fix/support too?

@chenqianfzh
Copy link
Contributor

#7445

with this PR, meta-llama/Llama-Guard-3-8B-INT8 is supported.

@chenqianfzh
Copy link
Contributor

The speed can be slower than

A lot of quantizations here are not in the optimized method list yet. It is not our top priority now to optimize the speed yet, as we are working to support more quantization features of bnb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants