Skip to content

BMVC 2023: Video-adverb retrieval with compositional adverb-action embeddings

License

Notifications You must be signed in to change notification settings

ExplainableML/ReGaDa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Video-adverb retrieval with compositional adverb-action embeddings

BMVC 2023
PWC PWC PWC PWC PWC
python pytorch black license

This repository contains the official code for the BMVC 2023 (Oral) paper Video-adverb retrieval with compositional adverb-action embeddings.

img/teaser_regada.png

Requirements

Install all required dependencies into a new virtual environment via conda.

conda env create -f environment.yaml

Datasets

We provide pre-extracted S3D features of all datasets to download. We thank the authors of Action Changes for sharing them with us.

You can dowload the features of all datasets here:

The features should be (per default) placed inside a features folder:

mkdir features
unzip video-adverb-datasets_s3d.zip -d features/

In case you store the features in a different location, you can set the path during execution:

python train.py feature_dir=PATH_TO_FEATURES [...]

To download videos and to extract S3D features of the datasets on your own, you can follow the instructions given here: https://github.com/dmoltisanti/air-cvpr23

Unseen adverb-action composition splits

Pseudo Adverbs proposed a VATEX Adverbs dataset split for evaluating unseen adverb-action compositions. We replicate this split for the S3D features used in this work by omitting unavailable videos and additionally propose new splits for evaluating the performance on unseen compositions for the ActivityNet Adverbs and MSR-VTT Adverbs datasets. The statistics for the dataset splits are given below:

VATEX Adv. ActivityNet Adv. MSR-VTT Adv.
# train samples 6603 1490 987
# unlabelled samples 3317 634 306
# test samples 3293 848 454
# pairs train 319 635 225
# pairs unlabelled 168 537 114
# pairs test 316 543 225

You can find the corresponding files for the unseen compositions splits at splits/unseen_compositions/DATASET_NAME. Each folder contains the following files:

  • antonyms.csv: mapping file for mapping adverbs to their respective antonym
  • train.csv: list of samples / videos used for training
  • test.csv: list of samples / videos used for testing
  • unlabelled.csv: list of samples / videos that can be used as unlabelled data (pseudo-labelling)

Model weights

We additionally provide model checkpoints. The checkpoints can be downloaded here.

Training

In the following, we provide commands for training our method ReGaDa for video-adverb retrieval. We provide commands for both the main experiments and for the experiment on unseen adverb-action compositions.

Main results

python train.py +run=main_howto100m    # HowTo100M Adverbs
python train.py +run=main_air          # Adverbs in Recipes
python train.py +run=main_activitynet  # ActivityNet Adverbs
python train.py +run=main_msrvtt       # MSR-VTT Adverbs
python train.py +run=main_vatex        # VATEX Adverbs

Unseen compositions

python train.py +run=unseen_vatex        # VATEX Adverbs
python train.py +run=unseen_activitynet  # ActivityNet Adverbs
python train.py +run=unseen_msrvtt       # MSR-VTT Adverbs

Evaluation

You can evaluate your trained model, or one of the model checkpoints provided, using the following command:

python test.py checkpoint=CHECKPOINT_PATH

where CHECKPOINT_PATH is the checkpoint directory of the model to evaluate.

Results

Main Results

HowTo100M Adverbs

Model mAP W mAP M Acc-A
Priors 0.446 0.354 0.786
S3D pre-trained 0.339 0.238 0.560
TIRG 0.441 0.476 0.721
Action Modifier 0.406 0.372 0.796
ACCLS 0.562 0.420 0.786
ACREG 0.555 0.423 0.799
ReGaDa (ours) 0.567 0.528 0.817

Adverbs in Recipes

Model mAP W mAP M Acc-A
Priors 0.491 0.263 0.854
S3D pre-trained 0.389 0.173 0.735
TIRG 0.485 0.228 0.835
Action Modifier 0.509 0.251 0.857
ACCLS 0.606 0.289 0.841
ACREG 0.613 0.244 0.847
ReGaDa (ours) 0.704 0.418 0.874

ActivityNet Adverbs

Model mAP W mAP M Acc-A
Priors 0.217 0.159 0.745
S3D pre-trained 0.118 0.070 0.560
TIRG 0.186 0.111 0.709
Action Modifier 0.184 0.125 0.753
ACCLS 0.130 0.096 0.741
ACREG 0.119 0.079 0.714
ReGaDa (ours) 0.239 0.175 0.771

MSR-VTT Adverbs

Model mAP W mAP M Acc-A
Priors 0.308 0.152 0.723
S3D pre-trained 0.194 0.075 0.603
TIRG 0.297 0.113 0.700
Action Modifier 0.233 0.127 0.731
ACCLS 0.305 0.131 0.751
ACREG 0.282 0.114 0.774
ReGaDa (ours) 0.378 0.228 0.786

VATEX Adverbs

Model mAP W mAP M Acc-A
Priors 0.216 0.086 0.752
S3D pre-trained 0.122 0.038 0.586
TIRG 0.195 0.065 0.735
Action Modifier 0.139 0.059 0.751
ACCLS 0.283 0.108 0.754
ACREG 0.261 0.086 0.755
ReGaDa (ours) 0.290 0.113 0.817

Retrival of Adverbs for unseen adverb-action compositions (binary antonym accuracy)

Model VATEX Adv. ActivityNet Adv. MSR-VTT Adv.
CLIP 54.5 55.1 57.0
Action Modifier 53.8 57.0 56.0
ACCLS 54.3 55.1 53.7
ACREG 54.9 53.9 59.0
ReGaDa (ours) 61.7 58.4 61.0

Citation

If you find this code useful, please consider citing:

@inproceedings{hummel2023regada,
  author    = {Hummel, Thomas and Mercea, Otniel-Bogdan and Koepke, A. Sophia and Akata, Zeynep},
  title     = {Video-adverb retrieval with compositional adverb-action embeddings},
  booktitle = {BMVC},
  year      = {2023}
}

Other repositories

Small portions of the code are adapted from the following repositories:

About

BMVC 2023: Video-adverb retrieval with compositional adverb-action embeddings

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages