Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: phi-3.5 is a strong model for its size, including vision support. Has multi-image support, but vllm does not support #7740

Closed
pseudotensor opened this issue Aug 21, 2024 · 2 comments · Fixed by #7783

Comments

@pseudotensor
Copy link

🚀 The feature, motivation and pitch

phi-3.5 is a strong model for its size, including strong multi-image vision support. But vllm does not support the multi-image case.

elif len(re.findall(r"(<\|image_\d+\|>)+", prompt)) > 1:
logger.warning("Multiple image input is not supported yet, "
"so any extra image tokens will be treated "
"as plain text.")

Alternatives

Only other models

Additional context

No response

@DarkLight1337
Copy link
Member

@Isotr0py are you interested in implementing this?

@Isotr0py
Copy link
Collaborator

Of course, I'm just working on implementing this feature.
I will create a PR once it's nearly to be finished.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants