README.txt

Bhojpuri Language Technological Resources (BHLTR)
========================================================
Introduction
=======
The Bhojpuri (https://en.wikipedia.org/wiki/Bhojpuri_language) LT Resources (BHLTR) project was intially initiated by me (Atul (http://ufal.ms.mff.cuni.cz/atul-kr-ojha)) at Jawaharlal Nehru University (JNU), New Delhi (http://sanskrit.jnu.ac.in/index.jsp) during the doctoral(http://sanskrit.jnu.ac.in/rstudents/phd.jsp) research work. BHLTR data contains monolingual, parallel (English-Bhojpuri), and POS annotaed monolingual corpora. In this data, POS is annotated  according to Bureau of Indian Standards (BIS) Part Of Speech (POS) tagset(http://tdil-dc.in/tdildcMain/articles/134692Draft%20POS%20Tag%20standard.pdf).

Structure of the `BHLTR data` folder
=======================

bho-resources/
├─ mono-bho-corpus/
│  ├─ monolingual.bho
│  ├─ README.md
│  ├─ pos-annotated/
│  │  └─ pos-tagged.bho
│ 
│  
└─ parallel-corpora/
   ├─ README.md
   ├─ eng-bho/
   │  └─ eng-bho.en
   │  └─ eng-bho.bho
├─ license.md
├─ README.md
├─ README.txt


Acknowledgments
=======

I would like to thanks my Doctoral supervisor Prof. Girish Nath Jha (https://jnu.ac.in/Faculty/gnjha/) and Sanskrit Computational Lab, JNU, New Delhi (http://sanskrit.jnu.ac.in/index.jsp).

References
=======
<pre>
@article{ojha2019english,
  title={English-Bhojpuri SMT System: Insights from the Karaka Model},
  author={Ojha, Atul Kr},
  journal={arXiv preprint arXiv:1905.02239},
  year={2019}
}
</pre>
<pre>
@inproceedings{karakanta2019proceedings,
  title={Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages},
  author={Karakanta, Alina and Ojha, Atul Kr and Liu, Chao-Hong and Washington, Jonathan and Oco, Nathaniel and Lakew, Surafel Melaku and Malykh, Valentin and Zhao, Xiaobing},
  booktitle={Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages},
  year={2019}
}
</pre>
<pre>
@article{kumar2018automatic,
  title={Automatic identification of closely-related Indian languages: Resources and experiments},
  author={Kumar, Ritesh and Lahiri, Bornini and Alok, Deepak and Ojha, Atul Kr and Jain, Mayank and Basit, Abdul and Dawer, Yogesh},
  journal={arXiv preprint arXiv:1803.09405},
  year={2018}
}
</pre>
<pre>
@inproceedings{ojha2015training,
  title={Training \& evaluation of POS taggers in Indo-Aryan languages: a case of Hindi, Odia and Bhojpuri},
  author={Ojha, Atul Kr. and Behera, Pitambar and Singh, Srishti and Jha, Girish N},
  booktitle={the proceedings of 7th Language \& Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics},
  pages={524--529},
  year={2015}
}
</pre>

<pre>
=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: BHLTR v1.0
License: CC BY-NC-SA 4.0
Includes text: yes
Contributors: Ojha, Atul Kr.
Contact: shashwatup9k@gmail.com
===============================================================================
</pre>