Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
-
Updated
Nov 7, 2024 - Python
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
An advanced, extensible web front-end for the Manatee-open corpus search engine
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
An open-source collaborative web-based application for multi-task lexical normalisation
Web based database for sign language lexicons and corpuses. Fork of NGT-signbank (https://github.com/Signbank/Global-signbank).
Bitextor generates translation memories from multilingual websites
General Missives in Text-Fabric
A parser for annotated MuseScore 3 files.
Open source Python package to produce word sketches inspired by Sketch Engine (to make reproducible analyses)
Rezonator: Dynamics of human engagement
OpusFilter - Parallel corpus processing toolkit
MFTE (Multi Feature Tagger of English) Python is the Python version based on Le Foll's MFTE written in Perl. It is extended to include semantic tags from Biber (2006) and Biber et al. (1999), including other specific tags.
Yet another search platform for linguistic corpora.
Python library for extracting quantitative, reproducible metrics of multi-level alignment between speakers in naturalistic language corpora.
🛠 Tools to create, edit and export texts and annotations
Analyzes binary executables and can generate a test corpus for defined instruction paths, each discovered function, or it can generate a test corpus to reach every basic block detected in non library/shared object parts of the bin's text section.
Add a description, image, and links to the corpus-tools topic page so that developers can more easily learn about it.
To associate your repository with the corpus-tools topic, visit your repo's landing page and select "manage topics."