text-data

Star

Here are 20 public repositories matching this topic...

microsoft / DialoGPT

Star

Large-scale pretraining for dialogue

machine-learning dialogue text-generation pytorch transformer data-processing text-data gpt-2 dialogpt

Updated Oct 17, 2022
Python

asyml / texar

Star

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

python machine-learning natural-language-processing deep-learning tensorflow machine-translation text-generation data-processing bert text-data dialog-systems gpt-2 texar xlnet casl-project

Updated Aug 26, 2021
Python

microsoft / GODEL

Star

Large-scale pretrained models for goal-directed dialog

machine-learning dialogue transformers text-generation pytorch transformer data-processing language-model dialogue-systems text-data conversational-ai language-grounding pretrained-model dialogpt grounded-generation

Updated Dec 10, 2023
Python

asyml / texar-pytorch

Star

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

python machine-learning natural-language-processing deep-learning machine-translation text-generation pytorch data-processing bert text-data dialog-systems roberta gpt-2 texar xlnet casl-project texar-pytorch

Updated Apr 14, 2022
Python

asyml / forte

Star

Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/

python machine-learning natural-language-processing information-retrieval deep-learning pipeline natural-language data-processing text-data

Updated Feb 5, 2024
Python

thu-coai / cotk

Star

Conversational Toolkit. An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation

python machine-learning natural-language-processing deep-learning metrics data-processing natural-language-generation text-data cotk

Updated Aug 31, 2020
Python

LoLei / redditcleaner

Star

Cleans Reddit Text Data 📜 🧹

python nlp reddit praw data-cleaning hacktoberfest text-data pushshift psaw

Updated Apr 14, 2020
Python

BALaka-18 / rake_new2

Star

A Python library that enables smooth keyword extraction from any text using the RAKE(Rapid Automatic Keyword Extraction) algorithm.

nlp text python-library keywords keyword-extraction text-data keyword-search

Updated May 3, 2024
Python

PratikBarhate / question-classification

Star

Question Classification for the dataset CogComp QC Dataset - [ http://cogcomp.org/Data/QA/QC/ ].

nlp machine-learning experimental neural-network python3 pytorch spacy text-data question-classification

Updated Nov 10, 2020
Python

carted / processing-text-data

Star

Presents an optimized Apache Beam pipeline for generating sentence embeddings (runnable on Cloud Dataflow).

tensorflow dataflow apache-beam bert text-data tfhub use-bert

Updated Mar 7, 2022
Python

tayebiarasteh / retweet

Star

How Will Your Tweet Be Received? Predicting theSentiment Polarity of Tweet Replies

Updated Aug 29, 2021
Python

XMU-Kuangnan-Fang-Team / SpecificLDA

Star

A Python package implementing the Directed LDA model for targeted extraction of specific topics from text data

python lda text-data specific-lda

Updated Jan 12, 2025
Python

SignalN / parallelio

Star

For reading from and writing to parallel data files in Python

machine-learning natural-language-processing text preprocessing text-data pre-processing

Updated Sep 7, 2017
Python

This repository implements a pipeline to store various data of files from a large unstructured dataset. These fields are used for topic modeling (wordclouds, based on low-dimensional versions of embedding vectors, Named Entity Clustering and document-topic incidences). The information is aggregated and visualised using FCA.

elasticsearch visualisation embeddings documents ner text-data fca topics-modeling sentence-transformers top2vec topic-aggregation ner-clustering

Updated Jul 28, 2025
Python

Infinitode / DupliPy

Star

DupliPy is a quick and easy-to-use package that can handle text formatting and data augmentation tasks for NLP in Python. It now offers support for image augmentation tasks as well.

nlp data-science text-formatting ai images data-analysis data-preprocessing preprocessing language-models text-data augmentation text-datasets

Updated Aug 17, 2025
Python

mounaiban / bakdoh

Star

Just a bunch of experiments with embedded graph databases

python3 sqlite3 graph-databases text-data text-database graph-db

Updated Dec 11, 2021
Python

sevvalckc / Turkish-SAD

Star

Python script to perform sentiment analysis on Turkish text data using multiple pre-trained transformer models and list of Turkish Sentiment Analysis Datasets between 2012 to 2022.

sentiment-analysis text-data turkish-dataset

Updated May 23, 2025
Python

kanad-rep / Zipf-s-Law

Star

This is a simple graphical representation of Zipf's Law using term frequencies, calculated for three different text data.

nlp text-data zipfs-law matplotlib-pyplot

Updated Sep 25, 2020
Python

liaaaxu / IMDB-Scraping

Star

IMDb-Scraping is for retrieving user-generated movie text reviews as well as relevant movie characteristics from imdb.com.

movies scraping imdb user-generated-content text-data

Updated Sep 6, 2020
Python

Mohampouraz / Persian-poetry

Star

A comprehensive repository of classical Persian poetry, curated from Ganjoor.net, designed for Natural Language Processing (NLP), machine learning applications, and literary research.

nlp machine-learning text-classification persian literature farsi nlp-machine-learning text-data persian-poetry farsi-datasets

Updated Aug 2, 2025
Python

Improve this page

Add a description, image, and links to the text-data topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-data topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text-data

Here are 20 public repositories matching this topic...

microsoft / DialoGPT

asyml / texar

microsoft / GODEL

asyml / texar-pytorch

asyml / forte

thu-coai / cotk

LoLei / redditcleaner

BALaka-18 / rake_new2

PratikBarhate / question-classification

carted / processing-text-data

tayebiarasteh / retweet

XMU-Kuangnan-Fang-Team / SpecificLDA

SignalN / parallelio

KlaraGtknst / text_topic

Infinitode / DupliPy

mounaiban / bakdoh

sevvalckc / Turkish-SAD

kanad-rep / Zipf-s-Law

liaaaxu / IMDB-Scraping

Mohampouraz / Persian-poetry

Improve this page

Add this topic to your repo