Awesome datasets for Bangla language computing.
-
Updated
Mar 7, 2022 - Python
Awesome datasets for Bangla language computing.
Nirmol is an open-source dataset and API for detecting Bangla slang words. Detect offensive/bad/slang words in Bangla/Bengali/Banglish sentences. A helpful API and dataset for developers and researchers.
Bangla news classification and generation
A collection of Bangla newspaper and blog crawlers. Can be used to mine bangla text data for Natural Language Processing tasks.
Zilla-64: A Bangla Handwritten Word Dataset Of 64 Districts Name of Bangladesh and Recognition Using Holistic Approach
Different bangla datasets for sentiment analysis on bangla text
A Bangla license plates dataset (synthetic), generated with a mixture of deep learning and image processing. The labels are in darknet yolo format. [.txt, .data, .names]
The default auto correct dictionary added in avro Bangla keyboard doesn't contain enough word. So, this is my approach to enrich the dictionary. This file contains the correct spelling of commonly used Bangla words.
Bangla dataset for Opinion Mining
Scrape 4000+ Bangla Song Lyrics
This is the official repository of the paper titled "BnPC: A Gold Standard Corpus for Paraphrase Detection in Bangla, and its Evaluation", accepted in The 17th Workshop on Building and Using Comparable Corpora (BUCC 2024) co-located with LREC-COLING 2024. It contains the codes and the dataset.
Implementation of the paper 'Towards Full page Offline Bangla Handwritten Text Recognition using Image-to-Sequence Architecture'. For details, please read the README section.
"WBSUBNdb_text: Bangla handwritten text document dataset" is a Bangla text dataset containing 1383 offline handwritten text documents contributed by 190 writers. The dataset is composed of both simple and compound characters.
Bengali Natural Language Processing(BengaliNLP)
Bangla Q&A dataset that contains questions, answer and paragraphs to train your model
Handwritten Bangla Character Classification using ResNet-34 trained using BanglaLekha Dataset. System has been implemented in PyTorch. For details, see the README file.
Noise Identification, Noise reduction, and Sentiment Analysis on Bangla Noisy Texts
The official GitHub repository of the Bangla Visual Question Answering (VQA) system ChitroJera
In this project, we have built a database of Bangla Handwritten Letters which contains handwritten images of 84 Bangla letters (10 numerals, 11 vowels, 39 consonants, 24 compound letters). We also investigated some of the existing Bangla character recognition models and found that these models have lower accuracy when the database contains some …
Add a description, image, and links to the bangla-dataset topic page so that developers can more easily learn about it.
To associate your repository with the bangla-dataset topic, visit your repo's landing page and select "manage topics."