-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: The Offline Inference Embedding Example Fails #5181
Comments
I just ran the example and did not see this issue What model are you using? This error can occur if you call |
Interestingly enough, for me example is working fine and I actually see the example results (list of numbers) in my CLI. Moreover, your error message states:
The problem is that Hope it could help you somehow. |
Thanks for all of your help! Interestingly, the script works well with |
{
"architectures": [
"MistralForCausalLM" # << this tells us its a generation model
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.36.0",
"use_cache": true,
"vocab_size": 32000
}
{
"_name_or_path": "mistralai/Mistral-7B-v0.1",
"architectures": [
"MistralModel" # <<< this tells us its an embedding model
],
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pad_token_id": 2,
"rms_norm_eps": 1e-05,
"rope_theta": 10000.0,
"sliding_window": 4096,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.34.0",
"use_cache": false,
"vocab_size": 32000
} We automatically detect if the model is an embedding or a generation model based on these config. Supporting embedding models is a new feature. Thank you for bringing this bad UX to my attention. I am going to update to:
|
I get it. Thanks for explaining this! |
For what its worth I think people might want to use causal lm to generate embeddings of just the prompt, at least thats the use case I currently have. |
Your current environment
🐛 Describe the bug
With the latest vllm-0.4.3 installed, when running the official example offline inference embedding code https://docs.vllm.ai/en/stable/getting_started/examples/offline_inference_embedding.html in this line
outputs = model.encode(prompts)
, I get the following errors:The text was updated successfully, but these errors were encountered: