-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Model]: Florence-2 #5934
Comments
@DarkLight1337 Anyone working on this? |
No, but please wait for #5852 and #5276 to land first as they involve significant API changes for devs. In the meantime, you can take a look at at this guide to get an idea of how to implement a new model. |
Thanks, checking the guide and the previous PRs of adding phi3-vision, also #5276 |
Both #5852 and #5276 is merged. Do you still have plans to work on this PR @chandeldivyam ? |
@fcakyon Thanks for the reminder, it actually slipped my mind. Yes, I need florence-2 for a project I was working on. So, as an alternative for quick prototyping, I created a flask server but it is not the ideal solution. I will pick it up in the next week. Thanks! Are you working on something that would need it? |
@chandeldivyam Yes, I also need such a solution for my work. I'm trying to utilize https://github.com/Lightning-AI/LitServe since I only have a little experience with the vllm-project. |
@fcakyon have you looked into any benchmarking for litserve? Also, I think using vllm would make sense if there are ton of parallel requests right? |
@chandeldivyam Would be great to see florence-2 in vllm. |
Hey @chandeldivyam, |
Since there's been no update on this issue, this week I referred to the guide here and looked at how to add Phi3-vision to vLLM. I implemented the registry, but I ran into the following issue:
This error indicates that the Florence2 configuration has |
If only the language part of the model is using encoder-decoder (i.e. there is no cross-attention between text and visual features), then you can try implementing only the language part in vLLM first. |
@DarkLight1337, thanks for your comment. I think I understand, and it seems feasible. Since Florence2 only uses the encoder-decoder for the language part, specifically in the Florence2LanguageModel class, I can implement the language part and the vision part (DaViT) separately, then combine them later. I just need to organize the massive 2800 lines in the original modeling_florence.py file properly. |
Hey whats the update on this one?How to do i Run florence 2 using vllm? |
+1 |
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
The model to consider.
https://huggingface.co/microsoft/Florence-2-base
The closest model vllm already supports.
phi-3v , its a vlm
What's your difficulty of supporting the model you want?
No response
The text was updated successfully, but these errors were encountered: