[CVPR-2023 Workshop@NFVLR] Official PyTorch implementation of Learning CLIP Guided Visual-Text Fusion Transformer for Video-based Pedestrian Attribute Recognition
transformer pedestrian-attribute-recognition multi-modal-fusion video-based-attribute-recognition visual-text-fusion
-
Updated
Jun 11, 2024 - Python