Maskable Retentive Network for Video Moment Retrieval

Source code for our ACM MM 2024 paper

Task Example: The goal of both MR tasks NLMR (natural language moment retrieval) and SLMR (spoken language moment retrieval) is to predict the temporal boundaries $(\tau_{start}, \tau_{end})$ of target moment described by a given query $q$ (text or audio modality).

 Two important characteristics:
 1) Temporal association between video clips: The temporal correlation between two video clips that are farther apart is weaker;
 2) Redundant background interference: The background contains a lot of redundant information that can interfere with the recognition of the current event, and this redundancy is even worse in long videos.

Approach

The architecture of the Maskable Retentive Network (MRNet). We conduct modality-specific attention modes, that is, we set Unlimited Attention for language-related attention regions to maximize cross-modal mutual guidance, and perform a new Maskable Retention for video branch $\mathcal{A}(v\to v)$ for enhanced video sequence modeling.

Download and prepare the datasets

1. Download the datasets (Optional).

The video feature provided by 2D-TAN

  ActivityNet Captions C3D feature
  TACoS C3D feature

The video I3D feature of Charades-STA dataset from LGI

  wget http://cvlab.postech.ac.kr/research/LGI/charades_data.tar.gz
  tar zxvf charades_data.tar.gz
  mv charades data
  rm charades_data.tar.gz

The Audio Captions: ActivityNet Speech Dataset: download the original audio proposed by VGCL

2. For convenience, the extracted input data features can be downloaded directly from baiduyun, passcode:d4yl

3. Text and audio feature extraction (Optional).

 cd preprocess
 python text_encode.py
 python audio_encode.py

4. Set your own dataset path in the following .py file.

  ret/config/paths_catalog.py

5. Or prepare the files in the following structure (Optional).

  MRNet
  ├── configs
  ├── dataset
  ├── ret
  ├── data
  │   ├── activitynet
  │   │   ├── *text features
  │   │   ├── *audio features
  │   │   └── *video c3d features
  │   ├── charades
  │   │   ├── *text features
  │   │   └── *video i3d features
  │   └── tacos
  │       ├── *text features
  │       └── *video c3d features
  ├── train_net.py
  ├── test_net.py
  └── ···

Dependencies

pip install yacs h5py terminaltables tqdm librosa transformers
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch

Training

ActivityNet

python train_net.py --config-file --config-file checkpoints/best/activity/config.yml

TACoS

cd ret/modeling/ret_model, then copy the code in file ret_model_tacos.py to file ret_model.py.
python train_net.py --config-file checkpoints/best/tacos/config.yml

Charades

please wait for the update

Testing

ActivityNet

download the model weight file from Google Drive to the checkpoints/best/activity folder
python test_net.py --config-file checkpoints/best/activity/config.yml --ckpt checkpoints/best/activity/pool_model_14.pth

TACoS

download the model weight file from Google Drive to the checkpoints/best/tacos folder
cd ret/modeling/ret_model, then copy the code in file ret_model_tacos.py to file ret_model.py.
python test_net.py --config-file checkpoints/best/tacos/config.yml --ckpt checkpoints/best/tacos/pool_model_110e.pth

Charades

please wait for the update

LICENSE

The annotation files and many parts of the implementations are borrowed from MMN. Our codes are under MIT license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Maskable Retentive Network for Video Moment Retrieval

Approach

Download and prepare the datasets

Dependencies

Training

ActivityNet

TACoS

Charades

Testing

ActivityNet

TACoS

Charades

LICENSE

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
assets		assets
checkpoints/best		checkpoints/best
configs		configs
dataset		dataset
preprocess		preprocess
ret		ret
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
test_net.py		test_net.py
train_net.py		train_net.py

License

xian-sh/MRNet

Folders and files

Latest commit

History

Repository files navigation

Maskable Retentive Network for Video Moment Retrieval

Approach

Download and prepare the datasets

Dependencies

Training

ActivityNet

TACoS

Charades

Testing

ActivityNet

TACoS

Charades

LICENSE

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages