Bitextor generates translation memories from multilingual websites
-
Updated
Nov 11, 2024 - Python
Bitextor generates translation memories from multilingual websites
AutoCorpus is a tool backed by a large language model (LLM) for automatically generating corpus files for fuzzing.
A parser for annotated MuseScore 3 files.
A full-text article retrieval pipeline for biomedical literature.
A corpus of Ukrainian Twitter texts + instructions for downloading and filtering texts.
Augmentation scripts for the bAbI Dialog Tasks dataset
A set of corpus-based sampling & analysis M4L devices
A clean Fusha Arabic tagged corpus.
Katya or The Liberated Corpus a text corpus that allows you to request and scrape any web resource!
golden arabic corpus build for test Assem's arabicstemmer and other arabic stemmers
A corpus builder for evaluation of plagiarism detection tools
Scrimshaw parses IRC logs stored in the driftwood format for quotes attributable to a given user. Written in Rust.
Generate pseudo-English sentences for research in semantic composition
Natively log WeeChat channel and private messages, CTCP, and notices, in the driftwood standard. Written in Python.
The canonical resources to build the backend for a corpus/repository management framework for Crow, the Corpus and Repository of Writing
Information Retrieval Lab
A prototype for generating language in a grounded simulation of a simple hunter-gatherer world
Create a corpus for fine-tuning an OpenAI model
Add a description, image, and links to the corpus-generator topic page so that developers can more easily learn about it.
To associate your repository with the corpus-generator topic, visit your repo's landing page and select "manage topics."