thammegowda / mtdata Star 156 Code Issues Pull requests A tool that locates, downloads, and extracts machine translation corpora multilingual natural-language-processing machine-translation dataset natural-language-generation parallel-data Updated May 27, 2025 Python
Elbria / xling-SemDiv Star 7 Code Issues Pull requests Code and data for the EMNLP 2020 paper: "Detecting Fine-Grained Cross-Lingual Semantic Divergences without Supervision by Learning to Rank" learning-to-rank parallel-data bertology multilingual-bert semantic-divergences corpus-filtering synthetic-supervision cross-lingual-similarity Updated Feb 10, 2023 Python