Skip to content

BhavikShangari/Jetson-VLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jetson-VLM: An Open-Source VLM for Edge Applications


This Repository Implements Prsimatic VLM that could be deployed onto Low Powered Devices such as Jetson Nano using Optimized Models.


Model Architecture

Model comprises of DINO V2 Base (224px), SIGLIP Base(224px) as Image Encoders and Llama 3.2:1B as Language Model.

Description of image


Installation

git clone https://github.com/BhavikShangari/Jetson-VLM.git
cd Jetson-VLM
conda env create --name jetson_vlm --file environment.yml

Dataset Downloading

Llava v1.5 595K mixture

For loading dataset to train Your Model, we have Modified Llava v1.5 595K Mixture Dataset and performed text formatting over it, and created a csv file to make it easy for Loading.

Either Download CSV Manually from this Link or use

pip install gdown
gdown 1yZagkp2xFmPd53Zo0FDPU-CNy8GmAyII

Also Download Images.zip Manually here or

gdown 1MsjR_tfk2YHRwLTX1tLOzGc7r8JQdOfi
unzip Images.zip

Training

If starting from a Checkpoint

python3 train.py --model_path path/to/checkpoint.pt --per_device_batch_size 32 --learning_rate 2e-5 --output_dir ./results --epochs 10 --torch_compile True --save_strategy no --report_to wandb --lr_scheduler cosine --warmup_ratio 0.10 --logging_steps 100 --dataset_path data.csv --save_file_name path/to/model.pt

else

python3 train.py --per_device_batch_size 32 --learning_rate 2e-5 --output_dir ./results --epochs 10 --torch_compile True --save_strategy no --report_to wandb --lr_scheduler cosine --warmup_ratio 0.10 --logging_steps 100 --dataset_path data.csv --save_file_name path/to/model.pt

Checkpoints

Pre Trained Checkpoints are available:

Vision Language Aligned Models

Checkpoint Name Model Checkpoint
Pretrained Llama 3.2:1B + DINOV2 BASE (224px) + SIGLIP BASE (224px) (2 Epochs) Link
Instruct Llama 3.2:1B + DINOV2 BASE (224px) + SIGLIP BASE (224px) (2 Epochs) Link
Instruct Llama 3.2:1B + DINOV2 BASE (224px) + SIGLIP BASE (224px) (6 Epochs) Link

Multimodal Instruction Tuned Models

Coming soon


Generation

For Generation Download the Checkpoints and place in the Checkpoints Folder

cd Checkpoints
gdown {Checkpoint}
cd ..
python3 generate.py --model_path Checkpoints/{MODEL}.pt --image_path Path/to/image.png --prompt 'Explain what this image depicts' --device cuda:0

Deployment on Jetson Nano

Coming Soon...

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages