Large-scale pretraining for dialogue
-
Updated
Oct 17, 2022 - Python
Large-scale pretraining for dialogue
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Large-scale pretrained models for goal-directed dialog
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/
Conversational Toolkit. An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation
Cleans Reddit Text Data 📜 🧹
A Python library that enables smooth keyword extraction from any text using the RAKE(Rapid Automatic Keyword Extraction) algorithm.
Question Classification for the dataset CogComp QC Dataset - [ http://cogcomp.org/Data/QA/QC/ ].
Presents an optimized Apache Beam pipeline for generating sentence embeddings (runnable on Cloud Dataflow).
How Will Your Tweet Be Received? Predicting theSentiment Polarity of Tweet Replies
A Python package implementing the Directed LDA model for targeted extraction of specific topics from text data
For reading from and writing to parallel data files in Python
This repository implements a pipeline to store various data of files from a large unstructured dataset. These fields are used for topic modeling (wordclouds, based on low-dimensional versions of embedding vectors, Named Entity Clustering and document-topic incidences). The information is aggregated and visualised using FCA.
DupliPy is a quick and easy-to-use package that can handle text formatting and data augmentation tasks for NLP in Python. It now offers support for image augmentation tasks as well.
Just a bunch of experiments with embedded graph databases
Python script to perform sentiment analysis on Turkish text data using multiple pre-trained transformer models and list of Turkish Sentiment Analysis Datasets between 2012 to 2022.
This is a simple graphical representation of Zipf's Law using term frequencies, calculated for three different text data.
IMDb-Scraping is for retrieving user-generated movie text reviews as well as relevant movie characteristics from imdb.com.
A comprehensive repository of classical Persian poetry, curated from Ganjoor.net, designed for Natural Language Processing (NLP), machine learning applications, and literary research.
Add a description, image, and links to the text-data topic page so that developers can more easily learn about it.
To associate your repository with the text-data topic, visit your repo's landing page and select "manage topics."