AutoPhrase: Automated Phrase Mining from Massive Text Corpora
-
Updated
Jan 27, 2022 - C++
AutoPhrase: Automated Phrase Mining from Massive Text Corpora
Fast topic modeling platform
Analytic platform for real-time large-scale streams containing structured and unstructured data.
R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
R package to Embed All the Things! using StarSpace
DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boost efficiency in text extraction, web data extraction, data mining, document analysis. Offline processing is possible for security and confidentiality
R package for Byte Pair Encoding based on YouTokenToMe
A simple component to extract just the text from any file that has an IFilter installed. Available as a C++ COM component and as a C# .NET library.
Short string compression
A data processing pipeline for text-mining on contents extracted from PDFs using Apriori and Simplicial Complex algorithms
An C++ program which can provide a Google-like summary of a document given a list of positions of words and phrases to highlight.
Herramientas de obtención y análisis del corpus de noticias de 20minutos.
A text analysis tool for PDF files.
Topic modeling with AutoPhrase and CatE
Markov chain N-gram text generator for fast work with big number of N. Want to reach fast work with 6-grams or more.
Add a description, image, and links to the text-mining topic page so that developers can more easily learn about it.
To associate your repository with the text-mining topic, visit your repo's landing page and select "manage topics."