Using Llava or other vision-language multi-models with Ollama component? #3095

holdengand · 2024-07-31T02:39:21Z

holdengand
Jul 31, 2024

Is it possible to use Llava models with images with the Langflow ollama component?
It tosses an error if I try and send an image to it. It will run without an image but that defeats the purpose of llava. Other models also can't receive images with the ollama component, which is fair enough as I don't think any other models can do anything with images anyway. But it would be nice if the ollama component could receive images?

Unless I am missing something.

holdengand · 2024-08-28T02:03:06Z

holdengand
Aug 28, 2024
Author

I'm back, still hoping to find answers on how to use vision-language multi-models in langchain using Ollama. As of v1.0 langachian seems to have made note that it is able to input images. Maybe this is true for openAI? I don't know because I don't use it. But it doesn't seem to work with Ollama.

I will say when I try now, my error is different then what I reported above before. Now it says "Error building Component Ollama: not enough values to unpack (expected 2, got 1)". I can confirm that it works fine only inputting text. I am using the 'chat input' component as I'm under the impression that's the only way to supply the image and prompt at the moment.

Regardless, I am no programmer, I'd take a crack at building a custom component for this task if I had a better understanding of what needed to be done. But apparently no one is using this kind of vision feature in langchain, or at least isn't releasing components for the function on the 'store' or talking about this functionality, like, anywhere at all, because I have been looking. And I know its only been about 2 months, so I'm not trying to sound impatient or anything, but the documentation doesn't really mention this newly added functionality at all, other then the fact it supposedly exists and that more stuff related is planned, as of 1.0.

Is ANYONE using these vision capabilities? With OpenAi or otherwise? If a single person has gotten this working with Ollma I'd like to hear about and let me know if I'm just doing something wrong, or if its just not really 'usable' atm but that its in progress so I know if I should wait around for this functionality in Langchain to be more established or if I should look for other options.

Literally any info at all would be nice.

1 reply

dosubot[bot] bot Aug 28, 2024

It looks like you're encountering some issues with using vision-language models like Llava with the Ollama component in Langflow. Here are a few points that might help you troubleshoot and understand the current state of multimodal support in Langflow:

Multimodal Support in Langflow: Langflow version 1.0.16 supports multimodal capabilities, including handling images. The relevant components for handling image uploads and parsing are found in the src/backend/base/langflow/schema/image.py file ^[1]. The ChatInput class includes a FileInput named "files" which accepts both text and image file types ^[2].
Common Errors:
- "Error building Component Ollama: 'coroutine' object is not iterable": This error suggests there might be an issue with how asynchronous operations are handled. Ensure that all asynchronous functions are awaited properly.
- "Error building Component Ollama: not enough values to unpack (expected 2, got 1)": This indicates that the component is expecting a tuple with two values but is only receiving one. This might be related to how the image and text inputs are being processed.
Steps to Address the Issues:
- Verify Image Handling: Ensure that the ChatInput component is correctly configured to handle image inputs. The FileInput should accept both text and image file types.
- Check Ollama Component Configuration: Make sure that the Ollama component is correctly set up to handle multimodal inputs. This might involve modifying the component to properly process and pass image data along with text.
- Custom Component Development: If the existing components do not support your use case, you might need to develop a custom component. This would involve understanding how to handle image inputs and integrate them with the Ollama component.

If you need more detailed guidance on developing a custom component or further assistance, it might be helpful to look into the Langflow documentation or the source code for existing components. Additionally, @ogabrielluiz might be able to provide more specific insights or assistance with this issue.

To continue talking to Dosu, mention @dosu.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Llava or other vision-language multi-models with Ollama component? #3095

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Using Llava or other vision-language multi-models with Ollama component? #3095

holdengand Jul 31, 2024

Replies: 1 comment · 1 reply

holdengand Aug 28, 2024 Author

dosubot[bot] bot Aug 28, 2024

holdengand
Jul 31, 2024

Replies: 1 comment 1 reply

holdengand
Aug 28, 2024
Author