corpus-generator

Star

Here are 20 public repositories matching this topic...

bitextor / bitextor

Star

Bitextor generates translation memories from multilingual websites

Updated Nov 11, 2024
Python

user1342 / AutoCorpus

Sponsor

Star

AutoCorpus is a tool backed by a large language model (LLM) for automatically generating corpus files for fuzzing.

fuzzing dynamic-analysis corpus-generator vulnerability-research large-language-models llm

Updated Apr 23, 2024
Python

johentsch / ms3

Star

A parser for annotated MuseScore 3 files.

Updated Sep 26, 2024
Python

biomedicalinformaticsgroup / cadmus

Star

A full-text article retrieval pipeline for biomedical literature.

python information-retrieval text-mining corpus-generator bionlp biomedical-text-mining

Updated Nov 4, 2024
Python

kateryna-bobrovnyk / ukr-twi-corpus

Star

A corpus of Ukrainian Twitter texts + instructions for downloading and filtering texts.

python nlp scraper corpus python-script python3 ukrainian corpus-linguistics corpus-generator ukrainian-language

Updated Jul 4, 2019
Jupyter Notebook

uma-pi1 / OPIEC-pipeline

Star

ishalyminov / babi_tools

Star

Augmentation scripts for the bAbI Dialog Tasks dataset

python nlp machine-learning dialogue dialog dataset dataset-generation babi-tasks corpus-generator dialog-systems babi babi-dataset

Updated Oct 16, 2018
Python

felipetovarhenao / exquisitecorpus

Sponsor

Star

A set of corpus-based sampling & analysis M4L devices

sampling corpus-generator maxforlive corpus-processing

Updated Feb 8, 2022
Max

mohabmes / Sinai-corpus

Star

A clean Fusha Arabic tagged corpus.

corpus corpus-generator arabic-nlp arabic-corpus

Updated Aug 3, 2020
Python

thecsw / katya-dev

Star

Katya or The Liberated Corpus a text corpus that allows you to request and scrape any web resource!

corpus russian tagger corpus-linguistics corpus-generator corpus-builder text-corpus russian-literature corpus-processing corpus-analysis

Updated Mar 14, 2024
Go

ibnmalik / golden-corpus-arabic

Star

golden arabic corpus build for test Assem's arabicstemmer and other arabic stemmers

corpus stemmer corpus-data arabic corpus-generator corpurate

Updated Aug 24, 2018
Python

FerreroJeremy / Plagiarized-Corpus-Generator

Star

A corpus builder for evaluation of plagiarism detection tools

plagiarism corpus-generator corpus-builder

Updated Dec 12, 2016
PHP

apple-fritter / scrimshaw

Sponsor

Star

Scrimshaw parses IRC logs stored in the driftwood format for quotes attributable to a given user. Written in Rust.

nlp chat machine-learning irc log regex internet-relay-chat linguistics rust-lang unicode-characters mit-license corpora regular-expressions nlp-parsing plaintext log-parser nlp-machine-learning corpus-generator log-parsing

Updated Jun 17, 2023
Rust

phueb / MissingAdjunct

Star

Generate pseudo-English sentences for research in semantic composition

research-project corpus-generator semantic-composition

Updated May 3, 2023
Python

apple-fritter / weechat.driftwood

Sponsor

Star

Natively log WeeChat channel and private messages, CTCP, and notices, in the driftwood standard. Written in Python.

nlp chat machine-learning irc logging corpus internet-relay-chat weechat weechat-scripts mit-license corpus-linguistics nlp-machine-learning corpus-generator chat-log driftwood-format

Updated Jun 17, 2023
Python

writecrow / crow_backend

Star

The canonical resources to build the backend for a corpus/repository management framework for Crow, the Corpus and Repository of Writing

api natural-language-processing backend corpus corpus-linguistics corpus-generator corpus-builder

Updated Feb 16, 2025
PHP

jamal474 / Information-Retrieval-Lab

Star

Information Retrieval Lab

information-retrieval edit-distance ranking inverted-index tf-idf corpus-generator tokenization document-similarity cosine-similarity-scores positional-indexing

Updated Nov 25, 2023
Jupyter Notebook

Pendulun / WebCrawler

Star

python3 webcrawler corpus-generator

Updated Jul 13, 2023
Python

phueb / GroundedLang

Star

A prototype for generating language in a grounded simulation of a simple hunter-gatherer world

simulation corpus-generator artificial-language

Updated Oct 25, 2021
Python

patasmith / corpusmaker

Star

Create a corpus for fine-tuning an OpenAI model

python ai tdd generative-text openai sqlite3 corpus-generator fine-tuning sqlmodel text-data-processing llm-tools

Updated Apr 17, 2024
Python

Improve this page

Add a description, image, and links to the corpus-generator topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the corpus-generator topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

corpus-generator

Here are 20 public repositories matching this topic...

bitextor / bitextor

user1342 / AutoCorpus

johentsch / ms3

biomedicalinformaticsgroup / cadmus

kateryna-bobrovnyk / ukr-twi-corpus

uma-pi1 / OPIEC-pipeline

ishalyminov / babi_tools

felipetovarhenao / exquisitecorpus

mohabmes / Sinai-corpus

thecsw / katya-dev

ibnmalik / golden-corpus-arabic

FerreroJeremy / Plagiarized-Corpus-Generator

apple-fritter / scrimshaw

phueb / MissingAdjunct

apple-fritter / weechat.driftwood

writecrow / crow_backend

jamal474 / Information-Retrieval-Lab

Pendulun / WebCrawler

phueb / GroundedLang

patasmith / corpusmaker

Improve this page

Add this topic to your repo