[Feature] Support `CLIP4Clip` #2489

Dai-Wenxun · 2023-05-22T03:46:38Z

CLIP4Clip

CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval

Abstract

Video-text retrieval plays an essential role in multi-modal research and has been widely used in many real-world web applications. The CLIP (Contrastive Language-Image Pre-training), an image-language pre-training model, has demonstrated the power of visual concepts learning from web collected image-text datasets. In this paper, we propose a CLIP4Clip model to transfer the knowledge of the CLIP model to video-language retrieval in an end-to-end manner. Several questions are investigated via empirical studies: 1) Whether image feature is enough for video-text retrieval? 2) How a post-pretraining on a large-scale video-text dataset based on the CLIP affect the performance? 3) What is the practical mechanism to model temporal dependency between video frames? And 4) The Hyper-parameters sensitivity of the model on video-text retrieval task. Extensive experimental results present that the CLIP4Clip model transferred from the CLIP can achieve SOTA results on various video-text retrieval datasets, including MSR-VTT, MSVC, LSMDC, ActivityNet, and DiDeMo.

Results and Models

MSRVTT-9k

frame sampling strategy	resolution	gpus	backbone	adapter	pretrain	Recall@1	Recall@5	Recall@10	MdR	MnR	testing protocol	config	ckpt	log
uniform 12	224x224	8	ViT-B/32	Mean	clip	43.1	69.4	78.9	2.0	16.8	1 clips x 1 crop	config	ckpt	log

For more details on data preparation, you can refer to video_retrieval.

Train

You can use the following command to train a model.

python tools/train.py ${CONFIG_FILE} [optional arguments]

Example: train CLIP4Clip model on MSRVTT-9k dataset in a deterministic option with periodic validation.

python tools/train.py configs/retrieval/clip4clip/clip4clip_vit-base-p32-res224-clip-pre_8xb16-u12-5e_msrvtt-9k-rgb.py \
    --seed 0 --deterministic

For more details, you can refer to the Training part in the Training and Test Tutorial.

Test

You can use the following command to test a model.

python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments]

Example: test CLIP4Clip model on MSRVTT-9k dataset and dump the result to a pkl file.

python tools/test.py configs/retrieval/clip4clip/clip4clip_vit-base-p32-res224-clip-pre_8xb16-u12-5e_msrvtt-9k-rgb.py \
    checkpoints/SOME_CHECKPOINT.pth --dump result.pkl

For more details, you can refer to the Test part in the Training and Test Tutorial.

Citation

@article{luo2022clip4clip,
  title={CLIP4Clip: An empirical study of CLIP for end to end video clip retrieval and captioning},
  author={Luo, Huaishao and Ji, Lei and Zhong, Ming and Chen, Yang and Lei, Wen and Duan, Nan and Li, Tianrui},
  journal={Neurocomputing},
  volume={508},
  pages={293--304},
  year={2022},
}

codecov · 2023-06-15T08:35:24Z

Codecov Report

Patch coverage: 94.09% and project coverage change: +0.38 🎉

Comparison is base (1dc3a9a) 77.23% compared to head (8df84ef) 77.62%.

❗ Current head 8df84ef differs from pull request most recent head e6992c6. Consider uploading reports for the commit e6992c6 to get more accurate results

Additional details and impacted files

@@             Coverage Diff             @@
##           dev-1.x    #2489      +/-   ##
===========================================
+ Coverage    77.23%   77.62%   +0.38%     
===========================================
  Files          161      167       +6     
  Lines        13172    13440     +268     
  Branches      2266     2302      +36     
===========================================
+ Hits         10174    10433     +259     
- Misses        2449     2455       +6     
- Partials       549      552       +3

Flag	Coverage Δ
unittests	`77.62% <94.09%> (+0.38%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
mmaction/testing/__init__.py	`100.00% <ø> (ø)`
mmaction/structures/action_data_sample.py	`72.28% <83.33%> (+0.86%)`	⬆️
mmaction/datasets/transforms/text_transforms.py	`85.71% <85.71%> (ø)`
mmaction/models/similarity/clip_similarity.py	`87.09% <87.09%> (ø)`
mmaction/evaluation/metrics/retrieval_metric.py	`97.95% <97.95%> (ø)`
mmaction/datasets/__init__.py	`100.00% <100.00%> (ø)`
mmaction/datasets/transforms/__init__.py	`100.00% <100.00%> (ø)`
mmaction/datasets/transforms/formatting.py	`95.08% <100.00%> (+3.27%)`	⬆️
mmaction/datasets/transforms/loading.py	`81.31% <100.00%> (ø)`
mmaction/datasets/video_text_dataset.py	`100.00% <100.00%> (ø)`
... and 6 more

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

mm-assistant bot assigned hukkai May 22, 2023

Dai-Wenxun added the WIP work in progress label May 22, 2023

cir7 closed this Jun 13, 2023

cir7 reopened this Jun 13, 2023

cir7 self-requested a review June 13, 2023 09:54

cir7 approved these changes Jun 13, 2023

View reviewed changes

Dai-Wenxun removed the WIP work in progress label Jun 13, 2023

Dai-Wenxun and others added 23 commits June 15, 2023 15:03

add prepare_msrvtt.py

1fc5193

add prepare_msrvtt.sh

d580a95

update prepare_msrvtt.py for suffix

7ed0a89

add dataset

043504d

update transforms

1969d30

update data pipeline

7f389ce

update model

585dcb1

fix lint

6f560fb

update

a08182c

update metric

d3838a3

config

db3e069

fix

a0e8e0a

add clip requirement

12c93a7

fix lint

faa1d2e

add UT for text

1a4aa8a

undo

10c9bc7

add UT for data preprocessor

57126fb

add adapter and UT

d58ab69

fix lint

97937b8

add UT for VideoTextDataset

b7b593f

add UT for PackInputs

516441f

fix lint

6c103c0

fix req

92767c4

Dai-Wenxun and others added 28 commits June 15, 2023 15:05

fix frozen

0b0e581

fix metric

cbae17d

add docstring for metric

8d654fc

add UT for clipsimilarity

d464fbd

fix UT

1346948

fix lint

a58818c

fix config

e731034

fix

7c67781

add data README

f21d46d

fix

0deffcd

add model README

5af26d6

fix lint

3686c32

add zh-CN

331cb1b

fix README

6540ac9

fix

bf4ac64

update README

f862606

update README

a8a87f8

final commit

43bf6bd

fix lint

ee84afe

update readme

b683e2b

fix readme

8f05d89

fix clip UT

0f6e556

update config

5e77d1d

fix

a23dae4

fix

fc1750d

fix req

9a60bfe

fix req

e2ed185

fix lint

e6992c6

cir7 merged commit 274c2ad into open-mmlab:dev-1.x Jun 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support `CLIP4Clip` #2489

[Feature] Support `CLIP4Clip` #2489

Dai-Wenxun commented May 22, 2023 •

edited

Loading

codecov bot commented Jun 15, 2023

[Feature] Support CLIP4Clip #2489

[Feature] Support CLIP4Clip #2489

Conversation

Dai-Wenxun commented May 22, 2023 • edited Loading

CLIP4Clip

Abstract

Results and Models

MSRVTT-9k

Train

Test

Citation

codecov bot commented Jun 15, 2023

Codecov Report

[Feature] Support `CLIP4Clip` #2489

[Feature] Support `CLIP4Clip` #2489

Dai-Wenxun commented May 22, 2023 •

edited

Loading