Support qwen2 vl model #1546

yizhang2077 · 2024-09-30T19:19:53Z

Motivation

This PR adding support for Qwen2-VL model, which is also supported by vllm (here) and Imdeploy (here)

Modifications

add conversation template and chat template for Qwen2-vl，which is refereced by here
add image processor for Qwen2VL since the output of processor are different (for example, Qwen2VL image processor will output image_grid_thws as result), and add member image_grid_thws into ImageInputs
compute mrope positions for each request by using MRotaryEmbedding in vllm, and record result in InputMeta . QWen2VL model will use them as real positions
copy qwen2_vl.py from vllm and make some adaptions, make some modifications in pad_input_ids function, which will replace image_token_id in input_id by unique image_hash
add test about qwen2vl, which is largely copied from test_vision_openai_server.py

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example

Others

There is a bug in transformers about Qwen2VLConfig (here), vllm fix it in (here) while it is still not released to pip. SGLang has dependency about vllm. As a result, if we do not use latest vllm or make some modification for transformers, we can not run QWen2VL model correctly.
mrope relies on MRotaryEmbedding in vllm, which is not supported in vllm 0.5.5, so maybe we need update vllm version

Notice

this PR will update dependencies about vllm (0.5.5->0.6.3) and transformers(4.44->4.45.2)

merrymercy

Thanks for the contributions. I left a few comments.

We also did some refactoring recently (#1541, #1538). Could you rebase?

python/sglang/srt/conversation.py

python/sglang/srt/model_executor/forward_batch_info.py

test/srt/test_qwen2_vl_openai_server.py

yizhang2077 · 2024-10-07T12:36:29Z

Thanks for the contributions. I left a few comments.

We also did some refactoring recently (#1541, #1538). Could you rebase?

Sorry for the late reply, I am on vacation abroad these days. Of course I will resolve these conversations and rebase code.

yizhang2077 · 2024-10-08T09:56:54Z

Thanks for the contributions. I left a few comments.

We also did some refactoring recently (#1541, #1538). Could you rebase?

OK, I have rebased code into lastet version

test/srt/test_vision_openai_server.py

merrymercy · 2024-10-11T12:04:27Z

Can this run correctly now without the modification/update of vllm? If so, we can remove "WIP" in the PR title and merge this soon!

merrymercy · 2024-10-11T12:09:58Z

In #1632, I merged a small change from your PR to make you a contributor of this project. This allows your future commits to automatically trigger the CI.

yizhang2077 · 2024-10-11T16:04:43Z

Can this run correctly now without the modification/update of vllm? If so, we can remove "WIP" in the PR title and merge this soon!

I think not, there are still some dependencies for vllm latest version when loading vllmModelConfig

sglang/python/sglang/srt/model_executor/model_runner.py

Lines 233 to 242 in 56503d9

    
           self.vllm_model_config = VllmModelConfig( 
        
               model=self.server_args.model_path, 
        
               quantization=self.server_args.quantization, 
        
               tokenizer=None, 
        
               tokenizer_mode=None, 
        
               trust_remote_code=self.server_args.trust_remote_code, 
        
               dtype=self.server_args.dtype, 
        
               seed=self.server_args.random_seed, 
        
               skip_tokenizer_init=True, 
        
           )

python/sglang/srt/models/qwen2_vl.py

ispobock · 2024-10-13T16:03:09Z

The CI ut issue is caused by the old version of transformers. We need to upgrade transformers to 4.45.2.

merrymercy · 2024-10-14T05:57:34Z

@ispobock We can update the transformers version in the CI

sglang/.github/workflows/pr-test.yml

Line 32 in 869f1c0

pip install transformers==4.44

. There are multiple places.

python/pyproject.toml

zhyncs · 2024-10-15T19:48:34Z

BTW conflicts need to be resolved.

python/sglang/srt/conversation.py

.github/workflows/pr-test.yml

python/sglang/srt/layers/attention/triton_ops/prefill_attention.py

test/srt/test_vision_openai_server.py

zhyncs · 2024-10-17T17:05:01Z

python/sglang/srt/models/qwen2_vl.py

+import torch.nn as nn
+import torch.nn.functional as F
+from einops import rearrange, repeat
+from vllm.config import CacheConfig, MultiModalConfig


remove CacheConfig #1658

zhyncs · 2024-10-17T17:05:18Z

python/sglang/srt/models/qwen2_vl.py

+from vllm.config import CacheConfig, MultiModalConfig
+from vllm.distributed import parallel_state
+from vllm.distributed import utils as dist_utils
+from vllm.logger import init_logger


use the init_logger from SGLang

zhyncs · 2024-10-17T17:08:06Z

python/sglang/srt/models/qwen2_vl.py

+from vllm.distributed import parallel_state
+from vllm.distributed import utils as dist_utils
+from vllm.logger import init_logger
+from vllm.model_executor.layers.activation import QuickGELU


TODO: It's not difficult to implement in FlashInfer. ref https://github.com/vllm-project/vllm/blob/81ede99ca44a5b3518932a07ea4a76a719e7416e/csrc/cpu/activation.cpp#L62-L67
It can be implemented in subsequent PR.

zhyncs · 2024-10-17T17:13:18Z

BTW remember to run nightly eval after upgrade vllm https://github.com/sgl-project/sglang/actions/workflows/nightly-eval.yml cc @ispobock

zhyncs · 2024-10-17T17:25:17Z

In order to run nightly eval before merging into main, I changed the base branch to qwen2vl. Once you resolve some of the minor issues mentioned above and fix the conflicts with main, we can consider merging and then conduct some compatibility tests.

python/pyproject.toml

Co-authored-by: Yineng Zhang <yineng.zhang@baseten.co>

zhyncs · 2024-10-18T03:18:03Z

https://github.com/sgl-project/sglang/actions/runs/11395749589
This evaluation failed. Could you help fix it and submit another separate PR?

merrymercy · 2024-10-19T14:40:27Z

It seems this PR is merged to yizhang2077:support-qwen2-vl by accident? Should we open a new one?

yizhang2077 · 2024-10-20T03:48:45Z

It seems this PR is merged to yizhang2077:support-qwen2-vl by accident? Should we open a new one?

It seems this PR is merge into qwen2vl branch，and when this PR #1711 has merged into main, this PR can merge into main

zhyncs requested a review from ispobock September 30, 2024 22:25

merrymercy reviewed Oct 5, 2024

View reviewed changes

python/sglang/srt/conversation.py Outdated Show resolved Hide resolved

python/sglang/srt/model_executor/forward_batch_info.py Outdated Show resolved Hide resolved

test/srt/test_qwen2_vl_openai_server.py Outdated Show resolved Hide resolved

zhyncs mentioned this pull request Oct 7, 2024

[Feature] support qwen2 vl #1352

Closed

2 tasks

yizhang2077 force-pushed the support-qwen2-vl branch from 13c9872 to 6a96504 Compare October 10, 2024 15:29

merrymercy reviewed Oct 11, 2024

View reviewed changes

test/srt/test_vision_openai_server.py Outdated Show resolved Hide resolved

merrymercy mentioned this pull request Oct 11, 2024

Add image_token in conversation.py #1632

Merged

merrymercy added the high priority label Oct 12, 2024

Titan-p reviewed Oct 12, 2024

View reviewed changes

python/sglang/srt/models/qwen2_vl.py Show resolved Hide resolved

ispobock reviewed Oct 13, 2024

View reviewed changes

python/sglang/srt/models/qwen2_vl.py Outdated Show resolved Hide resolved

zhyncs reviewed Oct 15, 2024

View reviewed changes

python/pyproject.toml Outdated Show resolved Hide resolved

yizhang2077 added 12 commits October 16, 2024 09:31

Copy qwen2_vl.py from vllm

35f4e07

add qwen2-vl template for chat and conversation

063c470

add config for qwen2vl to solve transformers bug

9ab6921

support qwen2vl model

84d5125

add qwen2_vl openai test

bb5c5bd

add short comment

fbd6cb6

format code

6b86979

rebase code

1fb07f7

support MRotaryEmbedding in sglang

615549c

add image token into conversation

47feaf9

move qwen2_vl test into test_vision_openai_server.py

b0e554e

fix a typo

c59202d

yizhang2077 and others added 4 commits October 16, 2024 09:46

support radix cache for qwen2vl

a54535f

use triton attention

0df89f5

upgrade dependencies

b2a05f8

rebase code and set disable-cuda-graph for qwen2-vl by default

6317ed5

yizhang2077 force-pushed the support-qwen2-vl branch 3 times, most recently from 1d3f971 to abe760f Compare October 16, 2024 16:25

update srt dependency for vllm

6cfa80f

yizhang2077 force-pushed the support-qwen2-vl branch from 9ef378e to 6cfa80f Compare October 16, 2024 16:28

Merge branch 'main' into support-qwen2-vl

be258a2

merrymercy mentioned this pull request Oct 17, 2024

Add GLM-4v Multimodal Model support for SGLang #1641

Open

3 tasks

merrymercy reviewed Oct 17, 2024

View reviewed changes

python/sglang/srt/conversation.py Outdated Show resolved Hide resolved

merrymercy requested changes Oct 17, 2024

View reviewed changes

.github/workflows/pr-test.yml Outdated Show resolved Hide resolved

python/sglang/srt/layers/attention/triton_ops/prefill_attention.py Outdated Show resolved Hide resolved

test/srt/test_vision_openai_server.py Show resolved Hide resolved

remove redundant variable

2f793ad

zhyncs mentioned this pull request Oct 17, 2024

Development Roadmap (2024 Q4) #1487

Open

30 tasks

ispobock and others added 2 commits October 17, 2024 23:17

Merge branch 'main' into support-qwen2-vl

b98a8a1

avoid add twice

69a54bb

zhyncs changed the title ~~[WIP] Support qwen2 vl model~~ Support qwen2 vl model Oct 17, 2024

Merge branch 'main' into support-qwen2-vl

5959b12

zhyncs reviewed Oct 17, 2024

View reviewed changes

zhyncs changed the base branch from main to qwen2vl October 17, 2024 17:23

zhyncs reviewed Oct 17, 2024

View reviewed changes

python/pyproject.toml Outdated Show resolved Hide resolved

Update pyproject.toml

9294a72

Co-authored-by: Yineng Zhang <yineng.zhang@baseten.co>

zhyncs merged commit 92376c0 into sgl-project:qwen2vl Oct 18, 2024
1 check passed

ispobock mentioned this pull request Oct 19, 2024

Update vllm to 0.6.3 #1711

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support qwen2 vl model #1546

Support qwen2 vl model #1546

yizhang2077 commented Sep 30, 2024 •

edited

Loading

merrymercy left a comment •

edited

Loading

yizhang2077 commented Oct 7, 2024

yizhang2077 commented Oct 8, 2024 •

edited

Loading

merrymercy commented Oct 11, 2024 •

edited

Loading

merrymercy commented Oct 11, 2024 •

edited

Loading

yizhang2077 commented Oct 11, 2024 •

edited

Loading

ispobock commented Oct 13, 2024

merrymercy commented Oct 14, 2024

zhyncs commented Oct 15, 2024

zhyncs Oct 17, 2024

zhyncs Oct 17, 2024

zhyncs Oct 17, 2024 •

edited

Loading

zhyncs commented Oct 17, 2024

zhyncs commented Oct 17, 2024

zhyncs commented Oct 18, 2024

merrymercy commented Oct 19, 2024 •

edited

Loading

yizhang2077 commented Oct 20, 2024

Support qwen2 vl model #1546

Support qwen2 vl model #1546

Conversation

yizhang2077 commented Sep 30, 2024 • edited Loading

Motivation

Modifications

Checklist

Others

Notice

merrymercy left a comment • edited Loading

Choose a reason for hiding this comment

yizhang2077 commented Oct 7, 2024

yizhang2077 commented Oct 8, 2024 • edited Loading

merrymercy commented Oct 11, 2024 • edited Loading

merrymercy commented Oct 11, 2024 • edited Loading

yizhang2077 commented Oct 11, 2024 • edited Loading

ispobock commented Oct 13, 2024

merrymercy commented Oct 14, 2024

zhyncs commented Oct 15, 2024

zhyncs Oct 17, 2024

Choose a reason for hiding this comment

zhyncs Oct 17, 2024

Choose a reason for hiding this comment

zhyncs Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

zhyncs commented Oct 17, 2024

zhyncs commented Oct 17, 2024

zhyncs commented Oct 18, 2024

merrymercy commented Oct 19, 2024 • edited Loading

yizhang2077 commented Oct 20, 2024

yizhang2077 commented Sep 30, 2024 •

edited

Loading

merrymercy left a comment •

edited

Loading

yizhang2077 commented Oct 8, 2024 •

edited

Loading

merrymercy commented Oct 11, 2024 •

edited

Loading

merrymercy commented Oct 11, 2024 •

edited

Loading

yizhang2077 commented Oct 11, 2024 •

edited

Loading

zhyncs Oct 17, 2024 •

edited

Loading

merrymercy commented Oct 19, 2024 •

edited

Loading