Note: All the steps listed below are enclosed in VEQA/run.sh
. Execute source ./run.sh
to get the code set up and running.
After cloning, make sure to run the following commands (make sure you are in the root directory VEQA/
):
python -m venv env
source env/bin/activate
pip install -r requirements.txt
mkdir Trained
Run ./download.sh
to download the image features and question-answer annotations.
Note: This script downloads the COCO features, which take ~3 hours to download. To expedite things, we provide an abdridged version of the data for quick running/testing of code here. It contain ~5% of the total number of questions (amounting to ~12,000 questions). Although please note that we trained our models on the full data.
After running this script, you data
directory should look as follows:
coco/
evalfinalData.json
finalData.json
mscoco_train2014_annotations.json
mscoco_val2014_annotations.json
MultipleChoice_mscoco_train2014_questions.json
MultipleChoice_mscoco_val2014_questions.json
Note: To reproduce VEQA in the SNLI-VE configuration, download the data using this script.
Note: Change values in config.json
, especially base_dir
, [train|eval]_hypothesis_path
and output_dir
python run.py --config config/config.json --mode train
python run.py --config config/config.json --mode test --model <model path>
ve_train.py
and ve_dataset.py
contain the model configuration for SNLI-VE training and the dataloader for SNLI-VE respectively. ve_train.py
can be run the same was as run.py
.
run.py
- VEQA train/test pipeline for AS_{VQA} configuration
model.py
- VEQA model code
ve_train.py
- VEQA train pipeline for AS_{SNLI-VE} configuration
data_preprocess/sentence_generator.py
- Code for precomputing hypothesis
dataset.py
- Dataset for VQA dataset
ve_dataset.py
- Dataset for SNLI-VE dataset
@InProceedings{VQA,
author = {Stanislaw Antol and Aishwarya Agrawal and Jiasen Lu and Margaret Mitchell and Dhruv Batra and C. Lawrence Zitnick and Devi Parikh},
title = {{VQA}: {V}isual {Q}uestion {A}nswering},
booktitle = {International Conference on Computer Vision (ICCV)},
year = {2015},
}
@article{xie2019visual,
title={Visual Entailment: A Novel Task for Fine-grained Image Understanding},
author={Xie, Ning and Lai, Farley and Doran, Derek and Kadav, Asim},
journal={arXiv preprint arXiv:1901.06706},
year={2019}
}
@article{xie2018visual,
title={Visual Entailment Task for Visually-Grounded Language Learning},
author={Xie, Ning and Lai, Farley and Doran, Derek and Kadav, Asim},
journal={arXiv preprint arXiv:1811.10582},
year={2018}
}