Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Dynamic image size support for VLMs #5276

Merged
merged 242 commits into from
Jul 3, 2024

Conversation

DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented Jun 5, 2024

This PR uses the input registry introduced by #5214 to implement an input process that inserts image tokens automatically at the LLMEngine level, so that it applies to LLM.generate.

Accordingly, I have updated LLaVA-NeXT and Phi-3-Vision to support dynamic image size. Along the way, I have expanded the VLM tests to consider text-only and multiscale-image input in addition to the current single-scale image input.

Based on this, I have written a detailed guide on how to implement multimodal vLLM models.

Please note that this introduces a breaking change to users. Instead of manually repeating image tokens, the same prompt format as described in the corresponding HuggingFace repo should be used regardless of the model.

Related contributions

Follow-up to #5214.

This PR conflicts with #5237 as it inserts image tokens at the OpenAIServing level. This PR has removed such logic from the server to avoid double insertion.

@DarkLight1337 DarkLight1337 marked this pull request as draft June 5, 2024 11:24
@DarkLight1337 DarkLight1337 changed the title [Core][Docs] Use input processor to insert image tokens [Core][Doc] Use input processor to insert image tokens Jun 5, 2024
Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM - I'll just need to run some testing on my end before finally approving this!

examples/llava_next_example.py Outdated Show resolved Hide resolved
vllm/multimodal/base.py Outdated Show resolved Hide resolved
vllm/multimodal/image.py Show resolved Hide resolved
Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for the work and glad we resolved all the issues!

@youkaichao youkaichao merged commit 9831aec into vllm-project:main Jul 3, 2024
68 of 70 checks passed
@DarkLight1337 DarkLight1337 deleted the mm-image-tokenizer-2 branch July 3, 2024 04:13
prashantgupta24 pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 3, 2024
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: ywang96 <ywang@roblox.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Jul 7, 2024
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: ywang96 <ywang@roblox.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 8, 2024
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: ywang96 <ywang@roblox.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: ywang96 <ywang@roblox.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants