VLM Finetuner is a web-based tool designed for fine-tuning Vision-Language Models (VLMs). It provides an intuitive interface for researchers and developers to manage VLMs, perform Vision Question Answering (VQA), generate image captions, monitor system resources, and interact with an AI-powered chatbot for assistance.
- Search for Vision-Language Models on Hugging Face with pagination support.
- Download models, including access-controlled ones using Hugging Face tokens.
- Fine-tune models using custom datasets.
- List and delete downloaded models.
- Upload images and ask questions using Gemini or OpenAI models.
- View, delete, and manage VQA history (stored in SQLite by default).
- Interactive chatbot powered by Gemini or OpenAI models for guidance.
- Draggable and resizable chatbot window with a debug overlay for layout troubleshooting.
- Full-screen mode on small screens for improved usability.
- Upload image folders to generate captions using a selected VLM (e.g.,
google/gemma-3-12b-it:free
). - Edit, save, and export captions as a ZIP file or JSON dataset.
- Real-time monitoring of CPU, memory, and disk usage on the server.
- Built with React, TypeScript, and Material-UI for a modern, responsive interface.
- Dynamic layout adjustments with collapsible elements (e.g., sidebar, header, footer).
Ensure the following dependencies are installed before proceeding:
- Python: Version 3.8 or higher.
- Node.js: Version 16 or higher.
- npm: Bundled with Node.js.
- Hugging Face Account: Required for downloading models (token needed for restricted models).
- API Keys (optional for advanced features):
- Google Generative AI API key (for Gemini models).
- OpenAI API key (for OpenAI models).
git clone https://github.com/Jaseci-Labs/jac-vision.git
cd jac-vision
python -m venv venv
venv/Scripts/activate
source venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 4000
cd frontend # Navigate to the frontend directory if applicable
npm install --legacy-peer-deps
npm run dev