#
mutimodal
Here are 5 public repositories matching this topic...
"A private, local OCR solution using Meta's Llama 3.2 Vision model with a Streamlit interface. Processes images entirely offline, supporting formats like JPEG, PNG, and BMP.
-
Updated
Nov 21, 2024 - Python
Gemini 2 Pro app for Image, Audio, and Document understanding + Code Execution.
-
Updated
Feb 9, 2025 - Python
A multimodal RAG application using Qwen 2.5 VL, ColPali, and QdrantDB for text and image-based retrieval.
-
Updated
Mar 20, 2025 - Jupyter Notebook
Improve this page
Add a description, image, and links to the mutimodal topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the mutimodal topic, visit your repo's landing page and select "manage topics."