Skip to content

Latest commit

 

History

History
70 lines (55 loc) · 6.31 KB

GETTING_STARTED.md

File metadata and controls

70 lines (55 loc) · 6.31 KB

Outline to use Vilio for your own project

This outline aims to explain how you can use the repo for your own Vision & Language problem. Be it Visual Question Answering, Visual Reasoning, Classification etc. For applying it to the Hateful Memes Dataset, refer to SCORE_REPRO.md.

If anything pops up, do feel free to: Drop an issue / send a PR / send me an email at n.muennighoff@gmail.com

Data

Image-extraction

Extracting important features from images before training is the current standard in VL, as it significantly speeds up things. If you don't have extracted featurs yet, you can use the subrepo vilio/py-bottom-up-attention/data. Place a folder named img with all your images into vilio/py-bottom-up-attention/data.

  • Clone the repo:
    git clone https://github.com/Muennighoff/vilio.git
  • Setup extraction:
    cd vilio/py-bottom-up-attention; pip install -r requirements.txt
    pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
    cd vilio/py-bottom-up-attention; python setup.py build develop

Then run feature extraction as follows: cd vilio/py-bottom-up-attention; python detectron2_mscoco_proposal_maxnms.py --batchsize 4 --split img --weight vgattr --minboxes 36 --maxboxes 36
I recommend leaving the parameters as is. Increasing the amount of boxes (hence features extracted) sometimes helps marginally, but will slow down extraction, training & inference significantly.
Refer to the README.md under vilio/py-bottom-up-attention/README.md if you run into any problems.

Using the image & text features

The repo provides code for dealing with .tsv features (which are generated by the extraction above) or .lmdb features.

Depending on your feature format and text format, you probably want to go through either the code under vilio/fts_lmdb/ or vilio/fts_tsv/. I recommend just copying the hm_data.py in either of these folders and adjusting the code for your file format & data columns. You can also adjust hm_pretrain_data.py if you plan to perform task-specific pretraining (Refer to the table at the end of this .md to see which model has task-specific pretraining - Note that all models are pretrained models, but it sometimes helps performing additional pre-training (masking etc) on your specific dataset).

Modeling

PyTorch

Once your data is ready, I'd recommend making a copy of vilio/hm.py and depending on your project consider the following adjustments:

  • The score metric (currently roc-auc & accuracy)
  • Remove/adjust the clean_data call, which is specific to the hm dataset
  • Adjust the result dumping (currently dump_csv for a csv file output with id, predicted label, predicted probability)

PaddlePaddle

If you choose to run one of the ERNIE models implemented in PaddlePaddle, I'd recommend making a copy of vilio/ernie-vil/reader/hm_finetuning.py and making necessary adjustments on the go, while going through the file, such as

  • Add function in vilio/ernie-vil/baching/finetune_batching.py
  • Data handling in vilio/ernie-vil/reader/_tsv_reader.py
  • Copy the hm conf folder & adjust under vilio/ernie-vil/conf/
  • Add a data folder for your project at vilio/ernie-vil/data

Finally it is time to choose the model you want to run. Refer to the below table for a rough performance & implementation guide. When pre-trained models are available, you can download them by clicking on the respective language transformer.
Note that the performance rank might be very different for other datasets than Hateful Memes.

Model Language Transformers (--tr in params.py) Performance Rank for HM Pre-trained model available Task-specific pre-training enabled
E - ERNIE-VIL LARGE/SMALL ERNIE 1, 2 LARGE / BASE No (TODO)
D - DeVLBERT bert-base-uncased 8 BASE No
O - OSCAR LARGE/SMALL bert-large-uncased / bert-base-uncased 5, 6 LARGE / BASE Yes
U - UNITER LARGE/SMALL bert-large-cased / bert-base-cased 3, 4 LARGE / BASE Yes
U - UNITER LARGE/SMALL roberta-large / roberta-small 14 No No
V - VisualBERT bert-large-uncased 7 LARGE Yes
V - VisualBERT roberta-large / roberta-small 11 No Yes
V - VisualBERT albert-base-v2 - albert-xxlarge-v2 10 (XXL V2) No Yes
X - LXMERT bert-large-uncased / bert-base-uncased 9 LARGE Yes
X - LXMERT roberta-large / roberta-small 13 No Yes
X - LXMERT albert-base-v2 - albert-xxlarge-v2 12 (XXL V2) No Yes

For most models other language transformers might work as well, but havn't been tested yet. Note that for VL tasks having a pre-trained model makes a major difference. If you choose to use a pretrained model, make sure to place the weights file in vilio/data or for E-Models the params folder in vilio/ernie-vil/data/ernielarge/ / vilio/ernie-vil/data/erniesmall/.

Now just place your features & text data in the respective data folders & run the model.
Depending on which model & features you chose, refer to the bash files either under vilio/bash/training or vilio/ernie-vil/bash/training and adjust them to your needs.
The parameters are explained at vilio/params.py.