Skip to content

Repository for different Speech Datasets and Models for Indo-Aryan languages prepared by the Department under different projects

License

Notifications You must be signed in to change notification settings

kmi-linguistics/SpeeD-IA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

SpeeD-IA

Repository for different Speech Datasets and Models for Indo-Aryan languages prepared by the Dr. Bhimrao Ambedkar University and Council for Strategic and Defense Research under different projects, in collaboration with Karya Inc. and UnReaL-TecE LLP.

This repository currently contains the transcription of the speech data collected through the Karya App for the pilot project of the SpeeD-IA project in four languages - Awadhi, Bhojpuri, Braj and Magahi.

The audio could be downloaded here. SpeeD-IA Audio and Transcription is licensed under CC BY-NC-SA 4.0 . For commercial licensing of the dataset, contact UnReaL-TecE LLP.

If you are using the data, please cite the following paper

@inproceedings{interspeech2022,
    author = {Kumar, Ritesh and Singh, Siddharth and Ratan, Shyam and Raj, Mohit and Sinha, Sonal and lahiri, bornini and Seshadri, Vivek and Bali, Kalika and Ojha, Atul Kr.},        
    title = {Annotated Speech Corpus for Low Resource Indian Languages: Awadhi, Bhojpuri, Braj and Magahi},
    booktitle = {Proceedings of Speech for Social Good Workshop, Interspeech 2022},        
    year = {2022}
}

For any queries, please feel free to contact at riteshkr[dot]kmi - the email is at the most popular email domain stating with 'g'.

About

Repository for different Speech Datasets and Models for Indo-Aryan languages prepared by the Department under different projects

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published