Name		Name	Last commit message	Last commit date
parent directory ..
configs		configs
dataset		dataset
exp		exp
models		models
tasks		tasks
utils		utils
DATASET.md		DATASET.md
INSTALL.md		INSTALL.md
MODEL_ZOO.md		MODEL_ZOO.md
README.md		README.md
vl.yml		vl.yml

README.md

Multi-modality

Installation

Please follow the installation instructions in INSTALL.

The codebase support using wandb to monitor training. If you want to use wandb, you will need to set up it following this very short instruction, and also set wandb.enable in the config to be True. wandb.entity and wandb.project should also be set.

Datasets

You can find the dataset instructions in DATASET. We have provide all the metadata files of our data.

Model ZOO

You can find all the models and the scripts in MODEL_ZOO.

Warning

Thanks for some recent issues, and I finally find the bug for MSVD testing. I have been confused for the 'extremely high result' for a long time. 😄

Different from ANet and DiDeMo, which use paragraph for retrieval, MSVD has multiple text for one videos. Thus the is_paragraph_retrieval should be set to False for retrieval.

After fixing the bug, the zero-shot results for ViT-L/16_25M is 49.0, not 72.2. The results are quite normal, but still the best zero-shot results. I will conduct the corresponding experiments and update the results in the paper latter.

Pre-Training

We use CLIP pretrained models as the unmasked teachers by default:

Follow extract.ipynb to extract visual encoder from CLIP.
Change MODEL_PATH in clip.py.

For training, you can simply run the pretraining scripts in exp/pretraining as follows:

bash ./exp/pretraining/b16_ptk710_e200_f8_res224.sh

⚠️ Notes:

Set data_dir and your_data_path like your_webvid_path in data.py before running the scripts.
Set vision_encoder.pretrained in vision_encoder.pretrained in the corresponding config files.

Set --rdzv_endpoint to your MASTER_NODE:MASTER_PORT. You can also use the following commond to automatically set it:

MASTER_NODE=$(scontrol show hostnames "$SLURM_JOB_NODELIST" | head -n 1)
ALL_NODES=$(scontrol show hostnames "$SLURM_JOB_NODELIST")
MASTER_PORT=$((10000 + $RANDOM % 100))
torchrun --rdzv_endpoint=${MASTER_NODE}:10068 $@

save_latest=True will automatically save the latest checkpoint while training.
auto_resume=True will automatically loaded the best or latest checkpoint while training.

Zero-shot Evaluation

For zero-shot evaluation, you can simply run the pretraining scripts in exp/zero_shot as follows:

bash ./exp/zero_shot/ret_msrvtt/b16_25m.sh

⚠️ Notes:

Set your_model_path in the running scripts before running the scripts.
Set zero_shot=True and evaluate=True for zero-shot evaluation

Finetuning

For finetuning, you can simply run the pretraining scripts in exp/finetuning as follows:

bash ./exp/finetuning/ret_msrvtt/b16_25m.sh

⚠️ Notes:

Set your_model_path in the running scripts before running the scripts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi_modality

multi_modality

README.md

Multi-modality

Installation

Datasets

Model ZOO

Warning

Pre-Training

Zero-shot Evaluation

Finetuning

Files

multi_modality

Directory actions

More options

Directory actions

More options

Latest commit

History

multi_modality

Folders and files

parent directory

README.md

Multi-modality

Installation

Datasets

Model ZOO

Warning

Pre-Training

Zero-shot Evaluation

Finetuning