GPT-Vision-1

We all like Moondream the 1 billion Parameters Vision Language model that kicks ass.

Well how about something smaller a 200 Million Parameter Vision Language model which is not as good as I would like it to be

from transformers import AutoModelForCausalLM
from PIL import Image

model = AutoModelForCausalLM.from_pretrained("damerajee/GPTVision-1-ft", trust_remote_code=True)

image_path = "Your_image_path"
image = Image.open(image_path)
image = image.convert('RGB')

question = "Describe the scenery of this image"
answer = model.generate(image=image,question=question,max_new_tokens=40)
print("Answer:", answer)

Image	Question	Response
	what color is the doll dress?	A girl doll with a pink dress
	Write a terse but informative summary of the picture.	A computer keyboard with a keyboard on it, on a wooden table with a laptop and a keyboard tray in the middle

Model architecture

This Model follows the same architecture as LLava

Model	HF-LINK
GPT-VISION-1(This model is the pre-trained model )	GPT-Vision-1
GPT-VISION-1-FT(This model is the finetuned-one)	GPT-Vision-1-ft

Training Details

We first pre-train the model while freezing the LLM and the Vision Transformers and only pre-training the projector which is a simple MLP nothing unqiue
Then save the pre-train model to huggingface
Load the pre-train model for fine-tuning but this time we froze only the Vision Transformers
Notice that i use A simple VISION TRANSFORMERS instead of siglip or clip because i wanted less parameters
Also the entire process of this training was done on FREE GPUs specifically the kaggles P100 and 2 T4 GPUs

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
GPT-Vision		GPT-Vision
Notebooks		Notebooks
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT-Vision-1

Model architecture

Training Details

About

Releases

Packages

Languages

License

dame-cell/GPT-Vision-1

Folders and files

Latest commit

History

Repository files navigation

GPT-Vision-1

Model architecture

Training Details

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages