Skip to content

Personalized Vision Language Models for Social Human-Robot Interactions

License

Notifications You must be signed in to change notification settings

hamedR96/User-VLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

User-VLM 360°

Personalized Vision Language Models with User-aware Tuning for Social Human-Robot Interactions

Architecture

Authors

Overview

User-VLM 360° is a personalized vision-language model that enhances human-robot interaction by integrating user-aware tuning and bias-aware optimization. It adapts in real-time using multimodal signals and mitigates bias through preference optimization. The framework is validated across multiple benchmarks and real-world deployment on the Pepper robot.

Features

  • User-aware Tuning: Real-time adaptation of interactions using visual-linguistic signals.
  • Bias Mitigation: Ethical and fair optimization of user personalization.
  • 360° Socio-Emotive Interaction Dataset: Annotated with demographic, emotion, and relational metadata.
  • State-of-the-Art Performance: Achieves up to +35.3% F1 in personalized VQA and +47.5% F1 in facial feature understanding, with a 15% bias reduction and 30× speedup over baselines.

Deployment on Pepper

Pepper Deployment

Results

Personalized Performance

Results
User-VLM 360° outperforms baseline models in user-aware personalization, facial feature understanding, and multimodal reasoning. It achieves up to a 2x improvement in ROUGE-1 F1 scores over baselines in user-centric VQA tasks.

Fairness Optimization

Bias Mitigation
User-VLM 360° enhances fairness, improving ROUGE-1 and BERTScore while mitigating bias through DPO tuning.

Computational Efficiency

Performance
User-VLM 360° achieves up to a 30× reduction in FLOPs, significantly improving computational efficiency without compromising performance.

Resources


Citation

If you use User-VLM 360° in your research, please cite:

@article{rahimi2025uservlm,
  author    = {Hamed Rahimi and Adil Bahaj and Mouad Abrini and Mahdi Khoramshahi and Mounir Ghogho and Mohamed Chetouani},
  title     = {User-VLM 360°: Personalized Vision Language Models with User-aware Tuning for Social Human-Robot Interactions},
  journal   = {arXiv preprint arXiv:<ARXIV_PAPER_ID>},
  year      = {2025}
}

About

Personalized Vision Language Models for Social Human-Robot Interactions

Topics

Resources

License

Stars

Watchers

Forks