-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support qwen2 vl model #1546
Support qwen2 vl model #1546
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
13c9872
to
6a96504
Compare
Can this run correctly now without the modification/update of vllm? If so, we can remove "WIP" in the PR title and merge this soon! |
In #1632, I merged a small change from your PR to make you a contributor of this project. This allows your future commits to automatically trigger the CI. |
I think not, there are still some dependencies for vllm latest version when loading vllmModelConfig sglang/python/sglang/srt/model_executor/model_runner.py Lines 233 to 242 in 56503d9
|
The CI ut issue is caused by the old version of transformers. We need to upgrade transformers to 4.45.2. |
@ispobock We can update the transformers version in the CI sglang/.github/workflows/pr-test.yml Line 32 in 869f1c0
|
BTW conflicts need to be resolved. |
1d3f971
to
abe760f
Compare
9ef378e
to
6cfa80f
Compare
python/sglang/srt/layers/attention/triton_ops/prefill_attention.py
Outdated
Show resolved
Hide resolved
import torch.nn as nn | ||
import torch.nn.functional as F | ||
from einops import rearrange, repeat | ||
from vllm.config import CacheConfig, MultiModalConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove CacheConfig #1658
from vllm.config import CacheConfig, MultiModalConfig | ||
from vllm.distributed import parallel_state | ||
from vllm.distributed import utils as dist_utils | ||
from vllm.logger import init_logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the init_logger
from SGLang
from vllm.distributed import parallel_state | ||
from vllm.distributed import utils as dist_utils | ||
from vllm.logger import init_logger | ||
from vllm.model_executor.layers.activation import QuickGELU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: It's not difficult to implement in FlashInfer. ref https://github.com/vllm-project/vllm/blob/81ede99ca44a5b3518932a07ea4a76a719e7416e/csrc/cpu/activation.cpp#L62-L67
It can be implemented in subsequent PR.
BTW remember to run nightly eval after upgrade vllm https://github.com/sgl-project/sglang/actions/workflows/nightly-eval.yml cc @ispobock |
In order to run nightly eval before merging into main, I changed the base branch to qwen2vl. Once you resolve some of the minor issues mentioned above and fix the conflicts with main, we can consider merging and then conduct some compatibility tests. |
Co-authored-by: Yineng Zhang <yineng.zhang@baseten.co>
https://github.com/sgl-project/sglang/actions/runs/11395749589 |
It seems this PR is merged to yizhang2077:support-qwen2-vl by accident? Should we open a new one? |
It seems this PR is merge into qwen2vl branch,and when this PR #1711 has merged into main, this PR can merge into main |
Motivation
This PR adding support for Qwen2-VL model, which is also supported by vllm (here) and Imdeploy (here)
Modifications
Checklist
Others
Notice