Installation

Requirements

We mainly follow VINDLU to prepare the enviroment.

# create 
conda env create -f vl.yml
# activate
conda activate vl

To run UMT pretraining, you have to prepare the weights of the CLIP visual encoder as in the extract.ipynb, and set the MODEL_PATH in clip.py.