DejaVu: Disambiguation evaluation dataset for English-JApanese machine translation on VisUal information
English-Japanese caption dataset including word sense ambiguity for evaluation of multimodal machine translation.
- Captions: English captions and their Japanese translations, including 500 target words and 6 different caption templates. All target words have multiple word senses and are designed such that the limited contextual information in the captions is not sufficient to select the correct word sense. In order to translate correctly into Japanese, it is necessary to refer to the corresponding images.
- Images: 500 images referenced during translation. These images are obtained from ImageNet and Flickr. The WordNet id corresponding to the target word in the caption is used as the image file name.
- Format: Images are stored in the
images/
directory, and captions are available in thecaptions/
directory.index.txt
is used to map the id of images in the order of captions.
The dataset is structured as follows:
- dataset/
- captions/
- en/
- template1.en
- template2.en
- ...
- ja/
- template1-1.ja
- template1-2.ja
- ...
- images/
- 10055410.jpg
- 10149867.jpg
- ...
- index.txt
- README.md
Clone the repository:
git clone git@github.com:tmu-nlp/DejaVu.git
This dataset is made available under the Creative Commons Attribution-ShareAlike (CC BY-SA) license.
If you use this dataset in your research, please cite it as follows:
@misc{DejaVu,
author = {Ayako Sato},
title = {DejaVu},
year = {2024},
howpublished = {\url{https://github.com/tmu-nlp/DejaVu}}
}