-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Important] Added README to the Qwen2VL implementation #11642
base: master
Are you sure you want to change the base?
Conversation
* Also allows no --image argument cli to the qwen2vl-cli
@HimariO @ggerganov @tc-mb sorry for tagging, just if you have the time to review this short PR :) |
Thanks for your invitation, but I'm sorry that I can't give a very accurate answer. This may require gg to decide whether to make all multi-modal cli support only text mode. It is also possible that gg has answered it a long time ago, but I can't find it. At present, I understand that this judgment comes from the earliest llava support. Perhaps you can find some clues from the earliest PR. I hope what I know can help you. |
examples/llava/README-qwen2vl.md
Outdated
|
||
*Have fun with the models ! :)* | ||
|
||
## Limitations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably mention the fact that the vision model(clip.cpp) currently had its GPU backend support disabled #10896
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just added this in the limitations section...
Great work on the Qwen2VL README! I forgot to include it with the original Qwen2VL PR, but I think it covers all the essential information needed to use the CLI tool. I'm not entirely sure about adding a text-only mode to the CLI, as that usage scenario would be better supported by integrating the Qwen2VL gguf model (the LLM component) with llama-cli(or just use Qwen2 LLM instead). |
Undo changes on qwen2vl-cli
You have a fair point. It is also wise to conform to other LLAVA models cli implementations. I have undo the changes on the cli, back to how it was. Thanks :D |
That makes sense. I originally intended this as in my use-case I used visual and text-only in the same session. However, the cli is not intended for chat session anyway. So, I removed the changes in the cli. Thank you :D |
It took me sometime to figure out how to use Qwen2VL CLI and also how to do the conversion. After looking into the code, here is my documentation about it. For respective contributors, please feel free to correct me if there is something missing.
Additionally, there are use-cases where we just want to do text-only prediction without image using Qwen2VL (while running a chat for example). So, this PR also allows no --image argument cli to the qwen2vl-cli. Let me know if you think --image has to exist, otherwise this change is actually not intrusive at all.