This is the code implementation for the paper titled: "Mitigating Dataset Bias in Image Captioning through CLIP Confounder-free Captioning Network" (Accepted to ICIP 2023)
To solve the dataset bias problem, we approach from the causal inference perspective and design a causal graph. Based on the causal graph, we propose a novel method named C2Cap which is CLIP confounder-free captioning network. We use the global visual confounder to control the confounding factors in the image and train the model to produce debiased captions.
- Python >= 3.9, CUDA >= 11.3
- PyTorch >= 1.12.0, torchvision >= 0.6.1
- Other packages: pycocotools, tensorboard, tqdm, h5py, nltk, einops,
- First, clone the repository locally:
git clone https://github.com/yeonju7kim/C2Cap.git
cd grit
- Then, create an environment and install PyTorch and torchvision:
conda create -n grit python=3.9
conda activate grit
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113
# ^ if the CUDA version is not compatible with your system; visit pytorch.org for compatible matches.
- Install other requirements:
pip install -r requirements.txt
python -m spacy download en
Currently, the README and source code are under its initial version. The cleaned and detailed version may be updated soon.
Download and extract COCO 2014 for image captioning including train, val, and test images with annotations from http://cocodataset.org. We expect the directory structure to be the following:
path/to/coco_caption/
├── annotations/ # annotation json files and Karapthy files
├── train2014/ # train images
├── val2014/ # val images
└── test2014/ # test images
- Copy the files in
data/
to the aboveannotations
folder. It includesvocab.json
and some files containing Karapthy ids.
python create_clip_feature_dictionary.py --train_folder=path_to_coco_train_data --valid_folder=path_to_coco_valid_data
- Training on Karapthy splits:
export DATA_ROOT=path/to/coco_caption
# evaluate on the validation split
python eval_caption.py +split='valid' exp.checkpoint=path_to_caption_checkpoint
# evaluate on the test split
python eval_caption.py +split='test' exp.checkpoint=path_to_caption_checkpoint
- Evaluation on Karapthy splits:
export DATA_ROOT=path/to/coco_caption
# evaluate on the validation split
python eval_caption.py +split='valid' exp.checkpoint=path_to_caption_checkpoint
# evaluate on the test split
python eval_caption.py +split='test' exp.checkpoint=path_to_caption_checkpoint
This code is implemented based on Ruotian Luo's implementation of image captioning in https://github.com/ruotianluo/self-critical.pytorch.