Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
-
Updated
May 30, 2025 - Python
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Text preprocessing, representation and visualization from zero to hero.
🧹 Python package for text cleaning
Preprocessing Library for Natural Language Processing
A python package for text preprocessing task in natural language processing.
Text preprocessing, representation, similarity calculation, text search and classification. Let's go and play with text!
This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation". It is intended to be used for normalizing / cleaning Bengali and English text.
Tensor Extraction of Latent Features (T-ELF). Within T-ELF's arsenal are non-negative matrix and tensor factorization solutions, equipped with automatic model determination (also known as the estimation of latent factors - rank) for accurate data modeling. Our software suite encompasses cutting-edge data pre-processing and post-processing modules.
Easy NLP in Python
A powerful text cleaner for Japanese web texts
Extreme Extractive Text Summarization and Topic Modeling (using LSA and LDA techniques) over Reddit Posts from TLDRHQ dataset.
Performs tokenization, stemming, lemmatization, index creation, index compression and ranked retrieval of Cranfield documents
Vector Space based Search Engine for Arxiv Research Publications
ToxiScan is a text analysis tool that leverages the power of Natural Language Toolkit (NLTK) and the Naive Bayes classifier to determine the presence of toxicity in textual data.
A tool for extracting chapters from Gutenberg Project Italian raw text e-books. RegEx are used to match chapter headings and extract the text between them.
My 2020 project focusing on NLP - Information Extraction
🐨 text preprocess.
A simple case study to learn how to do text mining from TikTok post
Add a description, image, and links to the text-preprocessing topic page so that developers can more easily learn about it.
To associate your repository with the text-preprocessing topic, visit your repo's landing page and select "manage topics."