The following is adapted from Scene-Graph-Benchmark, Danfei Xu and neural-motifs.
- Download the VG images part1 (9 Gb) part2 (5 Gb). Extract these images to the file
datasets/vg/VG_100K
. If you want to use other directory, please link it inDATASETS['VG_stanford_filtered']['img_dir']
ofmaskrcnn_benchmark/config/paths_catelog.py
. - Download the scene graphs labels and extract them to
datasets/vg/VG-SGG-with-attri.h5
, or you can edit the path inDATASETS['VG_stanford_filtered_with_attribute']['roidb_file']
ofmaskrcnn_benchmark/config/paths_catalog.py
. - Download the detection results of 3 datasets, including: Conceptual Caption, COCO Caption and Visual Genome. After downloading, you can run
cat cc_detection_results.zip.part* > cc_detection_results.zip
to merge several partitions into one zip file and unzip it to folderdatasets/vg/
.
After downloading the above files, you should have following hierarchy in folder datasets/vg/
:
├── VG_100K
├── cc_detection_results_oid
├── COCO_detection_results_oid
├── VG_detection_results_oid
└── VG-SGG-with-attri.h5
We provide scripts for data preprocessing, such as extracting the detection results from images and creating pseudo labels based on detection results and parsed concepts from image captions. More detail can be found in the folder preprocess.