Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model #1186

zhaochenyang20 · 2024-08-22T15:03:31Z

Motivation

Current SGLang only supports the e5-mistral embedding model. I added Alibaba-NLP/gte-Qwen2-7B-instruct model in this PR.

Also, previously SGLang determines a model as an embedding model through its hf_config.architectures. But gte model has the same architecture as CausalLM. So I added a new parameter in the server_args and changed the forward function of Qwen2ForCausalLM.

Modifications

Changed the forward function of Qwen2ForCausalLM.
Added a new parameter is_embedding in server_args.
Some related changes.
Added unit tests for gte models. (both in the generation and embedding tests. I used rouge-L score in the generation tests)
Changed readme.

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

README.md

python/sglang/srt/server_args.py

python/sglang/srt/managers/tokenizer_manager.py

python/sglang/srt/model_executor/model_runner.py

python/sglang/srt/models/qwen2.py

python/sglang/srt/server.py

test/srt/models/test_generation_models.py

zhaochenyang20 · 2024-08-23T07:24:33Z

@Ying1123 I added gte in the generation model test. Note that I changed the prefill tolerance accordingly and added the rouge-l metric instead of assert output_strs exactly the same.

hnyls2002 · 2024-08-26T00:20:52Z

test/srt/models/test_generation_models.py

+        import multiprocessing as mp
+
+        try:
+            mp.set_start_method("spawn")


Why would this be needed?

llmforever · 2024-08-28T06:25:12Z

@zhaochenyang20
您好，使用您提交的这个方法，和原始transformer与sentence transformer得到的embedding差距都很大，并且效果不好，您能帮忙看看吗？7B/1.5B的模型都测试过了
different result compare with orginal transformer backend,why?

prompt = “hello world”
sglang:
import openai
client = openai.Client(
base_url="http://localhost:30000/v1", api_key="EMPTY")

response = client.embeddings.create(
model="default",
input=prompt ,
)

transformer：
tokenizer = AutoTokenizer.from_pretrained('Alibaba-NLP/gte-Qwen2-7B-instruct', trust_remote_code=True)
model = AutoModel.from_pretrained('Alibaba-NLP/gte-Qwen2-7B-instruct', trust_remote_code=True)

max_length = 8192

batch_dict = tokenizer(prompt, max_length=max_length, padding=True, truncation=True, return_tensors='pt')
outputs = model(**batch_dict)
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])

embeddings = F.normalize(embeddings, p=2, dim=1)

zhaochenyang20 · 2024-09-01T23:55:56Z

@llmforever hello. Sorry, I haven't noticed this before. Do you still need to fix this? Actually, we have a unit test for this in test/srt/models/test_embedding_models.py. The logits here is indeed closed.

Also, I don't understand what did you mean by "perform not so well". Could you provide your running snifts and your serving command for SGLang.

And, does e5-mistral also have this problem? Or only get?

thomZ1 · 2024-09-02T05:30:40Z

The same problem, I tried using the SGLang OpenAI API and SentenceTransformer with the same prompt, but the output embeddings were different.

zhaochenyang20 · 2024-09-02T09:41:13Z

Yeah. The embedding could be different due to a lot of reasons. @llmforever

You can check this unit test: https://github.com/sgl-project/sglang/blob/main/test/srt/models/test_embedding_models.py

We set a tolerance value for the embedding difference.

Also, please try the e5-mistral model and give us the embedding difference.

https://huggingface.co/intfloat/e5-mistral-7b-instruct

@Ying1123 Do you think the difference provided is tolerable?

llmforever · 2024-09-04T11:45:51Z

Yeah. The embedding could be different due to a lot of reasons. @llmforever

You can check this unit test: https://github.com/sgl-project/sglang/blob/main/test/srt/models/test_embedding_models.py

We set a tolerance value for the embedding difference.

Also, please try the e5-mistral model and give us the embedding difference.

https://huggingface.co/intfloat/e5-mistral-7b-instruct

@Ying1123 Do you think the difference provided is tolerable?

I test about 10 cases，each accuracy drop from 80% to less than 10%，i think the difference is not tolerable，but the result of the e5-mistral-7b-instruct model is the same，can you please help me look that? Here is the code i use to generate the embedding：

for transformer：

import torch
import torch.nn.functional as F

from torch import Tensor
from transformers import AutoTokenizer, AutoModel

input_texts = ['hello']
tokenizer = AutoTokenizer.from_pretrained('Alibaba-NLP/gte-Qwen2-7B-instruct', trust_remote_code=True)
model = AutoModel.from_pretrained('Alibaba-NLP/gte-Qwen2-7B-instruct', trust_remote_code=True)

max_length = 8192

batch_dict = tokenizer(input_texts, max_length=max_length, padding=True, truncation=True, return_tensors='pt')
outputs = model(**batch_dict)
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])

embeddings = F.normalize(embeddings, p=2, dim=1)

for sglang:
import openai
import time
client = openai.Client(
base_url="http://localhost:30000/v1", api_key="EMPTY")

input_texts = ['hello']

queres = client.embeddings.create(
model="default",
input=quelist,
)
embeddings = torch.tensor(response.data[0].embedding)

zhaochenyang20 · 2024-09-04T13:43:44Z

@Ying1123 I think he provides an intolerable difference hummm? I gonna check it these days.

zhyncs requested a review from Ying1123 August 22, 2024 15:28

zhaochenyang20 mentioned this pull request Aug 23, 2024

[Model] Add support for 'gte-Qwen2' embedding models vllm-project/vllm#6282

Closed

Ying1123 requested changes Aug 23, 2024

View reviewed changes

zhaochenyang20 requested a review from Ying1123 August 23, 2024 07:24

zhaochenyang20 force-pushed the support_qwn2 branch from bc29e1e to 9cb4f5e Compare August 23, 2024 13:51

merrymercy changed the title ~~Support Alibaba-NLP/gte-Qwen2-7B-instruct Model~~ Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model Aug 23, 2024

Ying1123 approved these changes Aug 24, 2024

View reviewed changes

Ying1123 force-pushed the support_qwn2 branch from 7618fcf to afc74c1 Compare August 24, 2024 18:58

zhaochenyang20 mentioned this pull request Aug 25, 2024

[Feature] Use Embedding/Generation Model to get its Generation/Emebedding #1200

Closed

2 tasks

Ying1123 force-pushed the support_qwn2 branch 6 times, most recently from be88523 to 2daaac3 Compare August 25, 2024 07:21

Ying1123 approved these changes Aug 25, 2024

View reviewed changes

Ying1123 force-pushed the support_qwn2 branch from 2daaac3 to f52eb15 Compare August 25, 2024 07:33

Ying1123 enabled auto-merge (squash) August 25, 2024 07:33

Ying1123 force-pushed the support_qwn2 branch 3 times, most recently from 930e83d to efb207b Compare August 25, 2024 07:45

Ying1123 disabled auto-merge August 25, 2024 17:26

Ying1123 added 2 commits August 25, 2024 10:28

add support for gte embedding

e6a1331

skip last prompt for gte (passed on h100 but not our a100 CI machine)

fbef398

Ying1123 force-pushed the support_qwn2 branch from 4f5e72d to fbef398 Compare August 25, 2024 17:28

Ying1123 merged commit 30b4f77 into sgl-project:main Aug 25, 2024
0 of 4 checks passed

Ying1123 mentioned this pull request Aug 25, 2024

Development Roadmap (2024 Q3) #634

Closed

29 tasks

hnyls2002 reviewed Aug 26, 2024

View reviewed changes

zhaochenyang20 deleted the support_qwn2 branch September 1, 2024 12:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model #1186

Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model #1186

zhaochenyang20 commented Aug 22, 2024 •

edited

Loading

zhaochenyang20 commented Aug 23, 2024

hnyls2002 Aug 26, 2024

llmforever commented Aug 28, 2024 •

edited

Loading

zhaochenyang20 commented Sep 1, 2024

thomZ1 commented Sep 2, 2024

zhaochenyang20 commented Sep 2, 2024

llmforever commented Sep 4, 2024 •

edited

Loading

zhaochenyang20 commented Sep 4, 2024

Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model #1186

Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model #1186

Conversation

zhaochenyang20 commented Aug 22, 2024 • edited Loading

Motivation

Modifications

Checklist

zhaochenyang20 commented Aug 23, 2024

hnyls2002 Aug 26, 2024

Choose a reason for hiding this comment

llmforever commented Aug 28, 2024 • edited Loading

zhaochenyang20 commented Sep 1, 2024

thomZ1 commented Sep 2, 2024

zhaochenyang20 commented Sep 2, 2024

llmforever commented Sep 4, 2024 • edited Loading

zhaochenyang20 commented Sep 4, 2024

zhaochenyang20 commented Aug 22, 2024 •

edited

Loading

llmforever commented Aug 28, 2024 •

edited

Loading

llmforever commented Sep 4, 2024 •

edited

Loading