Jetson-VLM: An Open-Source VLM for Edge Applications

This Repository Implements Prsimatic VLM that could be deployed onto Low Powered Devices such as Jetson Nano using Optimized Models.

Model Architecture

Model comprises of DINO V2 Base (224px), SIGLIP Base(224px) as Image Encoders and Llama 3.2:1B as Language Model.

Installation

git clone https://github.com/BhavikShangari/Jetson-VLM.git
cd Jetson-VLM
conda env create --name jetson_vlm --file environment.yml

Dataset Downloading

Llava v1.5 595K mixture

For loading dataset to train Your Model, we have Modified Llava v1.5 595K Mixture Dataset and performed text formatting over it, and created a csv file to make it easy for Loading.

Either Download CSV Manually from this Link or use

pip install gdown
gdown 1yZagkp2xFmPd53Zo0FDPU-CNy8GmAyII

Also Download Images.zip Manually here or

gdown 1MsjR_tfk2YHRwLTX1tLOzGc7r8JQdOfi
unzip Images.zip

Training

If starting from a Checkpoint

python3 train.py --model_path path/to/checkpoint.pt --per_device_batch_size 32 --learning_rate 2e-5 --output_dir ./results --epochs 10 --torch_compile True --save_strategy no --report_to wandb --lr_scheduler cosine --warmup_ratio 0.10 --logging_steps 100 --dataset_path data.csv --save_file_name path/to/model.pt

else

python3 train.py --per_device_batch_size 32 --learning_rate 2e-5 --output_dir ./results --epochs 10 --torch_compile True --save_strategy no --report_to wandb --lr_scheduler cosine --warmup_ratio 0.10 --logging_steps 100 --dataset_path data.csv --save_file_name path/to/model.pt

Checkpoints

Pre Trained Checkpoints are available:

Vision Language Aligned Models

Checkpoint Name	Model Checkpoint
Pretrained Llama 3.2:1B + DINOV2 BASE (224px) + SIGLIP BASE (224px) (2 Epochs)	Link
Instruct Llama 3.2:1B + DINOV2 BASE (224px) + SIGLIP BASE (224px) (2 Epochs)	Link
Instruct Llama 3.2:1B + DINOV2 BASE (224px) + SIGLIP BASE (224px) (6 Epochs)	Link

Multimodal Instruction Tuned Models

Coming soon

Generation

For Generation Download the Checkpoints and place in the Checkpoints Folder

cd Checkpoints
gdown {Checkpoint}
cd ..
python3 generate.py --model_path Checkpoints/{MODEL}.pt --image_path Path/to/image.png --prompt 'Explain what this image depicts' --device cuda:0

Deployment on Jetson Nano

Coming Soon...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jetson-VLM: An Open-Source VLM for Edge Applications

Model Architecture

Installation

Dataset Downloading

Llava v1.5 595K mixture

Training

Checkpoints

Vision Language Aligned Models

Multimodal Instruction Tuned Models

Generation

Deployment on Jetson Nano

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Dataset		Dataset
Models		Models
extras		extras
README.md		README.md
environment.yml		environment.yml
generate.py		generate.py
train.py		train.py

BhavikShangari/Jetson-VLM

Folders and files

Latest commit

History

Repository files navigation

Jetson-VLM: An Open-Source VLM for Edge Applications

Model Architecture

Installation

Dataset Downloading

Llava v1.5 595K mixture

Training

Checkpoints

Vision Language Aligned Models

Multimodal Instruction Tuned Models

Generation

Deployment on Jetson Nano

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages