Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Important] Added README to the Qwen2VL implementation #11642

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

samkoesnadi
Copy link
Contributor

It took me sometime to figure out how to use Qwen2VL CLI and also how to do the conversion. After looking into the code, here is my documentation about it. For respective contributors, please feel free to correct me if there is something missing.

Additionally, there are use-cases where we just want to do text-only prediction without image using Qwen2VL (while running a chat for example). So, this PR also allows no --image argument cli to the qwen2vl-cli. Let me know if you think --image has to exist, otherwise this change is actually not intrusive at all.

* Also allows no --image argument cli to the qwen2vl-cli
@samkoesnadi
Copy link
Contributor Author

@HimariO @ggerganov @tc-mb sorry for tagging, just if you have the time to review this short PR :)

@tc-mb
Copy link
Contributor

tc-mb commented Feb 6, 2025

Thanks for your invitation, but I'm sorry that I can't give a very accurate answer.

This may require gg to decide whether to make all multi-modal cli support only text mode. It is also possible that gg has answered it a long time ago, but I can't find it.

At present, I understand that this judgment comes from the earliest llava support. Perhaps you can find some clues from the earliest PR.

I hope what I know can help you.


*Have fun with the models ! :)*

## Limitations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably mention the fact that the vision model(clip.cpp) currently had its GPU backend support disabled #10896

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just added this in the limitations section...

@HimariO
Copy link
Contributor

HimariO commented Feb 9, 2025

Great work on the Qwen2VL README! I forgot to include it with the original Qwen2VL PR, but I think it covers all the essential information needed to use the CLI tool.

I'm not entirely sure about adding a text-only mode to the CLI, as that usage scenario would be better supported by integrating the Qwen2VL gguf model (the LLM component) with llama-cli(or just use Qwen2 LLM instead).

Undo changes on qwen2vl-cli
@samkoesnadi
Copy link
Contributor Author

Thanks for your invitation, but I'm sorry that I can't give a very accurate answer.

This may require gg to decide whether to make all multi-modal cli support only text mode. It is also possible that gg has answered it a long time ago, but I can't find it.

At present, I understand that this judgment comes from the earliest llava support. Perhaps you can find some clues from the earliest PR.

I hope what I know can help you.

You have a fair point. It is also wise to conform to other LLAVA models cli implementations. I have undo the changes on the cli, back to how it was. Thanks :D

@samkoesnadi
Copy link
Contributor Author

Great work on the Qwen2VL README! I forgot to include it with the original Qwen2VL PR, but I think it covers all the essential information needed to use the CLI tool.

I'm not entirely sure about adding a text-only mode to the CLI, as that usage scenario would be better supported by integrating the Qwen2VL gguf model (the LLM component) with llama-cli(or just use Qwen2 LLM instead).

That makes sense. I originally intended this as in my use-case I used visual and text-only in the same session. However, the cli is not intended for chat session anyway. So, I removed the changes in the cli.

Thank you :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants