-
Notifications
You must be signed in to change notification settings - Fork 702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model #1186
Conversation
@Ying1123 I added gte in the generation model test. Note that I changed the prefill tolerance accordingly and added the rouge-l metric instead of assert output_strs exactly the same. |
bc29e1e
to
9cb4f5e
Compare
be88523
to
2daaac3
Compare
930e83d
to
efb207b
Compare
import multiprocessing as mp | ||
|
||
try: | ||
mp.set_start_method("spawn") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this be needed?
@zhaochenyang20 prompt = “hello world” response = client.embeddings.create( transformer: max_length = 8192 batch_dict = tokenizer(prompt, max_length=max_length, padding=True, truncation=True, return_tensors='pt') embeddings = F.normalize(embeddings, p=2, dim=1) |
@llmforever hello. Sorry, I haven't noticed this before. Do you still need to fix this? Actually, we have a unit test for this in Also, I don't understand what did you mean by "perform not so well". Could you provide your running snifts and your serving command for SGLang. And, does e5-mistral also have this problem? Or only get? |
Yeah. The embedding could be different due to a lot of reasons. @llmforever You can check this unit test: https://github.com/sgl-project/sglang/blob/main/test/srt/models/test_embedding_models.py We set a tolerance value for the embedding difference. Also, please try the e5-mistral model and give us the embedding difference. https://huggingface.co/intfloat/e5-mistral-7b-instruct @Ying1123 Do you think the difference provided is tolerable? |
I test about 10 cases,each accuracy drop from 80% to less than 10%,i think the difference is not tolerable,but the result of the e5-mistral-7b-instruct model is the same,can you please help me look that? Here is the code i use to generate the embedding: for transformer: import torch from torch import Tensor input_texts = ['hello'] max_length = 8192 batch_dict = tokenizer(input_texts, max_length=max_length, padding=True, truncation=True, return_tensors='pt') embeddings = F.normalize(embeddings, p=2, dim=1) for sglang: input_texts = ['hello'] queres = client.embeddings.create( |
@Ying1123 I think he provides an intolerable difference hummm? I gonna check it these days. |
Motivation
Current SGLang only supports the e5-mistral embedding model. I added Alibaba-NLP/gte-Qwen2-7B-instruct model in this PR.
Also, previously SGLang determines a model as an embedding model through its
hf_config.architectures
. But gte model has the same architecture as CausalLM. So I added a new parameter in theserver_args
and changed the forward function ofQwen2ForCausalLM
.Modifications
Qwen2ForCausalLM
.is_embedding
inserver_args
.Checklist