Please follow the installation instructions in INSTALL.
The codebase support using wandb to monitor training. If you want to use wandb, you will need to set up it following this very short instruction, and also set
wandb.enable
in the config to beTrue
.wandb.entity
andwandb.project
should also be set.
You can find the dataset instructions in DATASET. We have provide all the metadata files of our data.
You can find all the models and the scripts in MODEL_ZOO.
Thanks for some recent issues, and I finally find the bug for MSVD testing. I have been confused for the 'extremely high result' for a long time. 😄
Different from ANet and DiDeMo, which use paragraph for retrieval, MSVD has multiple text for one videos. Thus the is_paragraph_retrieval
should be set to False
for retrieval.
After fixing the bug, the zero-shot results for ViT-L/16_25M
is 49.0
, not 72.2
. The results are quite normal, but still the best zero-shot results. I will conduct the corresponding experiments and update the results in the paper latter.
We use CLIP pretrained models as the unmasked teachers by default:
- Follow extract.ipynb to extract visual encoder from CLIP.
- Change
MODEL_PATH
in clip.py.
For training, you can simply run the pretraining scripts in exp/pretraining
as follows:
bash ./exp/pretraining/b16_ptk710_e200_f8_res224.sh
- Set
data_dir
andyour_data_path
likeyour_webvid_path
in data.py before running the scripts. - Set
vision_encoder.pretrained
invision_encoder.pretrained
in the corresponding config files. - Set
--rdzv_endpoint
to yourMASTER_NODE:MASTER_PORT
. You can also use the following commond to automatically set it:MASTER_NODE=$(scontrol show hostnames "$SLURM_JOB_NODELIST" | head -n 1) ALL_NODES=$(scontrol show hostnames "$SLURM_JOB_NODELIST") MASTER_PORT=$((10000 + $RANDOM % 100)) torchrun --rdzv_endpoint=${MASTER_NODE}:10068 $@
save_latest=True
will automatically save the latest checkpoint while training.auto_resume=True
will automatically loaded the best or latest checkpoint while training.
For zero-shot evaluation, you can simply run the pretraining scripts in exp/zero_shot
as follows:
bash ./exp/zero_shot/ret_msrvtt/b16_25m.sh
- Set
your_model_path
in the running scripts before running the scripts. - Set
zero_shot=True
andevaluate=True
for zero-shot evaluation
For finetuning, you can simply run the pretraining scripts in exp/finetuning
as follows:
bash ./exp/finetuning/ret_msrvtt/b16_25m.sh
- Set
your_model_path
in the running scripts before running the scripts.