T2 Guiding is a dataset of 1000 images, each with six image labels. The images are from the Open Images Dataset (OID) and we provide 2 sets of machine-generated labels for these images.
- Object labels: Three random object labels generated by a FRCNN model trained on Visual Genome.
- Image labels: Three random image labels obtained from Google Cloud Vision API.
This dataset is used as the test set in the paper: "Understanding Guided Image Captioning Performance across Domains".
More details are available in this paper (please cite the paper if you use or discuss this dataset in your work):
@article{ng2020understanding, title={Understanding Guided Image Captioning Performance across Domains}, author={Edwin G. Ng and Bo Pang and Piyush Sharma and Radu Soricut}, journal={arXiv preprint arXiv:2012.02339}, year={2020} }
The released data is provided as a TSV (tab-separated values) text file with the following columns:
Table 1: Columns in TSV files.
Column | Description |
---|---|
1 | Image key. The unique identifier of the image in the Open Images Dataset (a hexadecimal number. e.g., 0000d67245642c5f). |
2 | Visual Genome objects. Comma-separated list of object labels generated by a FRCNN trained on Visual Genome. |
3 | Image labels. Comma-separated list of image labels obtained from Google Cloud Vision API. |
The dataset is available for download here. The mapping from the image key to the image URL can be found in the cvpr2019.tsv.meta file of the original T2 dataset download link.