Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support qwen2 vl model #1546

Merged
merged 23 commits into from
Oct 18, 2024
Merged

Conversation

yizhang2077
Copy link
Collaborator

@yizhang2077 yizhang2077 commented Sep 30, 2024

Motivation

This PR adding support for Qwen2-VL model, which is also supported by vllm (here) and Imdeploy (here)

Modifications

  1. add conversation template and chat template for Qwen2-vl,which is refereced by here
  2. add image processor for Qwen2VL since the output of processor are different (for example, Qwen2VL image processor will output image_grid_thws as result), and add member image_grid_thws into ImageInputs
  3. compute mrope positions for each request by using MRotaryEmbedding in vllm, and record result in InputMeta . QWen2VL model will use them as real positions
  4. copy qwen2_vl.py from vllm and make some adaptions, make some modifications in pad_input_ids function, which will replace image_token_id in input_id by unique image_hash
  5. add test about qwen2vl, which is largely copied from test_vision_openai_server.py

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example

Others

  • There is a bug in transformers about Qwen2VLConfig (here), vllm fix it in (here) while it is still not released to pip. SGLang has dependency about vllm. As a result, if we do not use latest vllm or make some modification for transformers, we can not run QWen2VL model correctly.
  • mrope relies on MRotaryEmbedding in vllm, which is not supported in vllm 0.5.5, so maybe we need update vllm version

Notice

  • this PR will update dependencies about vllm (0.5.5->0.6.3) and transformers(4.44->4.45.2)

Copy link
Contributor

@merrymercy merrymercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contributions. I left a few comments.

We also did some refactoring recently (#1541, #1538). Could you rebase?

python/sglang/srt/conversation.py Outdated Show resolved Hide resolved
python/sglang/srt/model_executor/forward_batch_info.py Outdated Show resolved Hide resolved
test/srt/test_qwen2_vl_openai_server.py Outdated Show resolved Hide resolved
@yizhang2077
Copy link
Collaborator Author

Thanks for the contributions. I left a few comments.

We also did some refactoring recently (#1541, #1538). Could you rebase?

Sorry for the late reply, I am on vacation abroad these days. Of course I will resolve these conversations and rebase code.

@zhyncs zhyncs mentioned this pull request Oct 7, 2024
2 tasks
@yizhang2077
Copy link
Collaborator Author

yizhang2077 commented Oct 8, 2024

Thanks for the contributions. I left a few comments.

We also did some refactoring recently (#1541, #1538). Could you rebase?

OK, I have rebased code into lastet version

@merrymercy
Copy link
Contributor

merrymercy commented Oct 11, 2024

Can this run correctly now without the modification/update of vllm? If so, we can remove "WIP" in the PR title and merge this soon!

@merrymercy
Copy link
Contributor

merrymercy commented Oct 11, 2024

In #1632, I merged a small change from your PR to make you a contributor of this project. This allows your future commits to automatically trigger the CI.

@yizhang2077
Copy link
Collaborator Author

yizhang2077 commented Oct 11, 2024

Can this run correctly now without the modification/update of vllm? If so, we can remove "WIP" in the PR title and merge this soon!

I think not, there are still some dependencies for vllm latest version when loading vllmModelConfig

self.vllm_model_config = VllmModelConfig(
model=self.server_args.model_path,
quantization=self.server_args.quantization,
tokenizer=None,
tokenizer_mode=None,
trust_remote_code=self.server_args.trust_remote_code,
dtype=self.server_args.dtype,
seed=self.server_args.random_seed,
skip_tokenizer_init=True,
)

@ispobock
Copy link
Collaborator

The CI ut issue is caused by the old version of transformers. We need to upgrade transformers to 4.45.2.

@merrymercy
Copy link
Contributor

@ispobock We can update the transformers version in the CI

pip install transformers==4.44
. There are multiple places.

python/pyproject.toml Outdated Show resolved Hide resolved
@zhyncs
Copy link
Member

zhyncs commented Oct 15, 2024

BTW conflicts need to be resolved.

@yizhang2077 yizhang2077 force-pushed the support-qwen2-vl branch 3 times, most recently from 1d3f971 to abe760f Compare October 16, 2024 16:25
.github/workflows/pr-test.yml Outdated Show resolved Hide resolved
test/srt/test_vision_openai_server.py Show resolved Hide resolved
@zhyncs zhyncs mentioned this pull request Oct 17, 2024
30 tasks
@zhyncs zhyncs changed the title [WIP] Support qwen2 vl model Support qwen2 vl model Oct 17, 2024
import torch.nn as nn
import torch.nn.functional as F
from einops import rearrange, repeat
from vllm.config import CacheConfig, MultiModalConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove CacheConfig #1658

from vllm.config import CacheConfig, MultiModalConfig
from vllm.distributed import parallel_state
from vllm.distributed import utils as dist_utils
from vllm.logger import init_logger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the init_logger from SGLang

from vllm.distributed import parallel_state
from vllm.distributed import utils as dist_utils
from vllm.logger import init_logger
from vllm.model_executor.layers.activation import QuickGELU
Copy link
Member

@zhyncs zhyncs Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: It's not difficult to implement in FlashInfer. ref https://github.com/vllm-project/vllm/blob/81ede99ca44a5b3518932a07ea4a76a719e7416e/csrc/cpu/activation.cpp#L62-L67
It can be implemented in subsequent PR.

@zhyncs
Copy link
Member

zhyncs commented Oct 17, 2024

BTW remember to run nightly eval after upgrade vllm https://github.com/sgl-project/sglang/actions/workflows/nightly-eval.yml cc @ispobock

@zhyncs zhyncs changed the base branch from main to qwen2vl October 17, 2024 17:23
@zhyncs
Copy link
Member

zhyncs commented Oct 17, 2024

In order to run nightly eval before merging into main, I changed the base branch to qwen2vl. Once you resolve some of the minor issues mentioned above and fix the conflicts with main, we can consider merging and then conduct some compatibility tests.

python/pyproject.toml Outdated Show resolved Hide resolved
Co-authored-by: Yineng Zhang <yineng.zhang@baseten.co>
@zhyncs zhyncs merged commit 92376c0 into sgl-project:qwen2vl Oct 18, 2024
1 check passed
@zhyncs
Copy link
Member

zhyncs commented Oct 18, 2024

https://github.com/sgl-project/sglang/actions/runs/11395749589
This evaluation failed. Could you help fix it and submit another separate PR?

@ispobock ispobock mentioned this pull request Oct 19, 2024
@merrymercy
Copy link
Contributor

merrymercy commented Oct 19, 2024

It seems this PR is merged to yizhang2077:support-qwen2-vl by accident? Should we open a new one?

@yizhang2077
Copy link
Collaborator Author

It seems this PR is merged to yizhang2077:support-qwen2-vl by accident? Should we open a new one?

It seems this PR is merge into qwen2vl branch,and when this PR #1711 has merged into main, this PR can merge into main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants