- Hamed Rahimi (ISIR, Sorbonne University)
- Adil Bahaj (International University of Rabat)
- Mouad Abrini (ISIR, Sorbonne University)
- Mahdi Khoramshahi (ISIR, Sorbonne University)
- Mounir Ghogho (International University of Rabat)
- Mohamed Chetouani (ISIR, Sorbonne University)
User-VLM 360° is a personalized vision-language model that enhances human-robot interaction by integrating user-aware tuning and bias-aware optimization. It adapts in real-time using multimodal signals and mitigates bias through preference optimization. The framework is validated across multiple benchmarks and real-world deployment on the Pepper robot.
- User-aware Tuning: Real-time adaptation of interactions using visual-linguistic signals.
- Bias Mitigation: Ethical and fair optimization of user personalization.
- 360° Socio-Emotive Interaction Dataset: Annotated with demographic, emotion, and relational metadata.
- State-of-the-Art Performance: Achieves up to +35.3% F1 in personalized VQA and +47.5% F1 in facial feature understanding, with a 15% bias reduction and 30× speedup over baselines.
User-VLM 360° outperforms baseline models in user-aware personalization, facial feature understanding, and multimodal reasoning. It achieves up to a 2x improvement in ROUGE-1 F1 scores over baselines in user-centric VQA tasks.
User-VLM 360° enhances fairness, improving ROUGE-1 and BERTScore while mitigating bias through DPO tuning.
User-VLM 360° achieves up to a 30× reduction in FLOPs, significantly improving computational efficiency without compromising performance.
If you use User-VLM 360° in your research, please cite:
@article{rahimi2025uservlm,
author = {Hamed Rahimi and Adil Bahaj and Mouad Abrini and Mahdi Khoramshahi and Mounir Ghogho and Mohamed Chetouani},
title = {User-VLM 360°: Personalized Vision Language Models with User-aware Tuning for Social Human-Robot Interactions},
journal = {arXiv preprint arXiv:<ARXIV_PAPER_ID>},
year = {2025}
}