SpeeD-IA

Repository for different Speech Datasets and Models for Indo-Aryan languages prepared by the Dr. Bhimrao Ambedkar University and Council for Strategic and Defense Research under different projects, in collaboration with Karya Inc. and UnReaL-TecE LLP.

This repository currently contains the transcription of the speech data collected through the Karya App for the pilot project of the SpeeD-IA project in four languages - Awadhi, Bhojpuri, Braj and Magahi.

The audio could be downloaded here. SpeeD-IA Audio and Transcription is licensed under CC BY-NC-SA 4.0 . For commercial licensing of the dataset, contact UnReaL-TecE LLP.

If you are using the data, please cite the following paper

@inproceedings{interspeech2022,
    author = {Kumar, Ritesh and Singh, Siddharth and Ratan, Shyam and Raj, Mohit and Sinha, Sonal and lahiri, bornini and Seshadri, Vivek and Bali, Kalika and Ojha, Atul Kr.},        
    title = {Annotated Speech Corpus for Low Resource Indian Languages: Awadhi, Bhojpuri, Braj and Magahi},
    booktitle = {Proceedings of Speech for Social Good Workshop, Interspeech 2022},        
    year = {2022}
}

For any queries, please feel free to contact at riteshkr[dot]kmi - the email is at the most popular email domain stating with 'g'.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
karya_transcriptions		karya_transcriptions
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeeD-IA

About

Releases

Packages

License

kmi-linguistics/SpeeD-IA

Folders and files

Latest commit

History

Repository files navigation

SpeeD-IA

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages