A collaborative catalog of NLP resources for Indic languages
-
Updated
Mar 14, 2024
A collaborative catalog of NLP resources for Indic languages
Resources and tools for Indian language Natural Language Processing
Indic-BERT-v1: BERT-based Multilingual Model for 11 Indic Languages and Indian-English. For latest Indic-BERT v2, check: https://github.com/AI4Bharat/IndicBERT
indicTranslate v1 - Machine Translation for 11 Indic languages. For latest v2, check: https://github.com/AI4Bharat/IndicTrans2
Resources to go with the Indic NLP Library
Codebase for Indic-Transliteration using Seq2Seq RNN. For latest repo with Transformer-based models, check: https://github.com/AI4Bharat/IndicXlit
Towards Building Text-To-Speech Systems for the Next Billion Users - Microsoft Research Intern Work - Accepted at ICASSP 2023
Software and Resources for Mitigating Online Gender Based Violence in India
Xlit-Crowd: Hindi-English Transliteration Corpus
Python library for converting numbers to words for all Indian Languages.
Curated list of publicly available parallel corpus for Indian Languages
Tooling to play around with multilingual machine translation for Indian Languages.
A Python NLP Toolkit for Gujarati(Under Progress)
An LSTM-CRF classifier for NER in Telugu, an Indian language.
A configurable engine for analysing multi-lingual and multi-modal content.
This repositary hosts my experiments for the project, I did with OffNote Labs.
Small demo showing how MuRIL (Multilingual Representations for Indian Languages : A BERT model pre-trained on 17 Indian languages) understands Indian Languages better
Repository for pre-trained wav2vec 2.0 models on 7 Indian languages
Translations for Aaptaha.
Add a description, image, and links to the indian-languages topic page so that developers can more easily learn about it.
To associate your repository with the indian-languages topic, visit your repo's landing page and select "manage topics."