Skip to content

Multi Modal LLM for Open-Vocabulary Object Material Estimation

Notifications You must be signed in to change notification settings

siddharth130500/MultiModalMaterialEstimation

Repository files navigation

Multimodal Material Estimation

Implementation of a material estimation model using audio-visual cues using CLIP and Whisper to encode the Image and Audio inputs, and an LLM to align them to a fixed size text embedding space which is later used for Material class prediction. The model description is available at Report

MLP

To run the training code: python train.py

About

Multi Modal LLM for Open-Vocabulary Object Material Estimation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •