The IndoWordnet Parallel Corpus

IndoWordnet is a linked structure of wordnets of major Indian languages from Indo-Aryan, Dravidian and Sino-Tibetan families. Synsets are linked across many languages. Every synset in every language contains a gloss and example usage sentence/phrase. In a large number of cases, the example and gloss sentences across languages are translations. Hence, IndoWordNet is a source of parallel corpora across multiple Indian languages.

The corpus contains about 6.3 million parallel segments across 18 Indian languages from 3 languages families.

NEWS! WMT 2020 is using this corpus for the shared task on similar language translation

Documentation

You can read more about the corpus in this document: pdf

Download the corpus

You can download the corpus HERE

Version History

v0.2 (14 May 2020): Bug fixes to address problems with extraction in v0.1.
v0.1 (25 March 2020): Initial release (BUGGY: don't use this version, use v0.2)

License

This dataset is released under the Creative Commons Attribution Share Alike 4.0 International license.

Citing this dataset

If you use this dataset, please include the following citation:

@misc{kunchukuttan2020iwnparallel,
author = "Anoop Kunchukuttan",
title = "IndoWordnet Parallel Corpus",
year = "2020",
howpublished={\url{https://github.com/anoopkunchukuttan/indowordnet_parallel}}
}

We would like to hear from you if:

You are using our resources. Please let us know how you are putting these resources to use.
You have any feedback on these resources.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
iwn_parallel_2020.pdf		iwn_parallel_2020.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The IndoWordnet Parallel Corpus

Documentation

Download the corpus

Version History

License

Citing this dataset

About

Releases

Packages

anoopkunchukuttan/indowordnet_parallel

Folders and files

Latest commit

History

Repository files navigation

The IndoWordnet Parallel Corpus

Documentation

Download the corpus

Version History

License

Citing this dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages