TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection (AAAI 2024 Paper)
by Hao Sun* 1, Mingyao Zhou* 1, Wenjing Chen†2, Wei Xie†1
1 Central China Normal University, 2 Hubei University of Technology, * Equal Contribution, † Corresponding authors.
[Paper]
git clone https://github.com/your-repo/tr_detr.git
cd tr_detr
If any dataset link becomes invalid, you can refer to Hugging Face for alternative resources.
Download the official feature files for the QVHighlights dataset from Moment-DETR.
- Download moment_detr_features.tar.gz (8GB) and extract it under the
../features
directory. - You can modify the data directory by changing the
feat_root
parameter in the shell scripts located in thetr_detr/scripts/
directory.
tar -xf path/to/moment_detr_features.tar.gz
Download the feature files for the TVSum dataset from UMT.
- Download TVSum (69.1MB) and either extract it under the
../features/tvsum/
directory or modify thefeat_root
parameter in the TVSum shell scripts located in thetr_detr/scripts/tvsum/
directory.
Python version 3.7 is required. Install dependencies using:
pip install -r requirements.txt
Note: The
requirements.txt
includes additional libraries that may not be required. These will be cleaned up in future updates. For Anaconda setup, refer to the official Moment-DETR GitHub.
You can train the model using only video features or both video and audio features:
bash tr_detr/scripts/train.sh # Only video
bash tr_detr/scripts/train_audio.sh # Video + audio
The best validation accuracy is achieved at the last epoch.
After training, you can generate hl_val_submission.jsonl
and hl_test_submission.jsonl
for validation and test sets by running:
bash tr_detr/scripts/inference.sh results/{direc}/model_best.ckpt 'val'
bash tr_detr/scripts/inference.sh results/{direc}/model_best.ckpt 'test'
Replace {direc}
with the path to your saved checkpoint. For more details on submission, see standalone_eval/README.md.
Similar to QVHighlights, you can train the model on the TVSum dataset:
bash tr_detr/scripts/tvsum/train_tvsum.sh # Only video
bash tr_detr/scripts/tvsum/train_tvsum_audio.sh # Video + audio
The best results are saved in results_[domain_name]/best_metric.jsonl
.
If you find this repository useful, please cite our work:
@inproceedings{sun_zhou2024tr,
title={Tr-detr: Task-reciprocal transformer for joint moment retrieval and highlight detection},
author={Sun, Hao and Zhou, Mingyao and Chen, Wenjing and Xie, Wei},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={38},
number={5},
pages={4998--5007},
year={2024}
}
The annotation files and parts of the implementation are borrowed from Moment-DETR and QD-DETR. Consequently, our code is also released under the MIT License.