VLM Finetuner

VLM Finetuner is a web-based tool designed for fine-tuning Vision-Language Models (VLMs). It provides an intuitive interface for researchers and developers to manage VLMs, perform Vision Question Answering (VQA), generate image captions, monitor system resources, and interact with an AI-powered chatbot for assistance.

Features

Model Management

Search for Vision-Language Models on Hugging Face with pagination support.
Download models, including access-controlled ones using Hugging Face tokens.
Fine-tune models using custom datasets.
List and delete downloaded models.

Vision Question Answering (VQA)

Upload images and ask questions using Gemini or OpenAI models.
View, delete, and manage VQA history (stored in SQLite by default).

Chatbot Assistance

Interactive chatbot powered by Gemini or OpenAI models for guidance.
Draggable and resizable chatbot window with a debug overlay for layout troubleshooting.
Full-screen mode on small screens for improved usability.

Image Captioning

Upload image folders to generate captions using a selected VLM (e.g., google/gemma-3-12b-it:free).
Edit, save, and export captions as a ZIP file or JSON dataset.

System Monitoring

Real-time monitoring of CPU, memory, and disk usage on the server.

Frontend

Built with React, TypeScript, and Material-UI for a modern, responsive interface.
Dynamic layout adjustments with collapsible elements (e.g., sidebar, header, footer).

Prerequisites

Ensure the following dependencies are installed before proceeding:

Python: Version 3.8 or higher.
Node.js: Version 16 or higher.
npm: Bundled with Node.js.
Hugging Face Account: Required for downloading models (token needed for restricted models).
API Keys (optional for advanced features):
- Google Generative AI API key (for Gemini models).
- OpenAI API key (for OpenAI models).

Setup Instructions

1. Clone the Repository

git clone https://github.com/Jaseci-Labs/jac-vision.git
cd jac-vision

2. Backend Setup

a. Create a Virtual Environment

python -m venv venv

b. Activate the Virtual Environment

On Windows (Command Prompt)

venv/Scripts/activate

On macOS / Linux / WSL

source venv/bin/activate

c. Install Backend Dependencies

pip install -r requirements.txt

d. Run the Backend

uvicorn main:app --host 0.0.0.0 --port 4000

3. Frontend Setup

a. Install Frontend Dependencies

cd frontend  # Navigate to the frontend directory if applicable
npm install --legacy-peer-deps

b. Start the Frontend

npm run dev

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
docs		docs
jac		jac
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VLM Finetuner

Features

Model Management

Vision Question Answering (VQA)

Chatbot Assistance

Image Captioning

System Monitoring

Frontend

Prerequisites

Setup Instructions

1. Clone the Repository

2. Backend Setup

a. Create a Virtual Environment

b. Activate the Virtual Environment

On Windows (Command Prompt)

On macOS / Linux / WSL

c. Install Backend Dependencies

d. Run the Backend

3. Frontend Setup

a. Install Frontend Dependencies

b. Start the Frontend

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

jaseci-labs/jac-vision

Folders and files

Latest commit

History

Repository files navigation

VLM Finetuner

Features

Model Management

Vision Question Answering (VQA)

Chatbot Assistance

Image Captioning

System Monitoring

Frontend

Prerequisites

Setup Instructions

1. Clone the Repository

2. Backend Setup

a. Create a Virtual Environment

b. Activate the Virtual Environment

On Windows (Command Prompt)

On macOS / Linux / WSL

c. Install Backend Dependencies

d. Run the Backend

3. Frontend Setup

a. Install Frontend Dependencies

b. Start the Frontend

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages