Mitigating Gender Bias in Machine Translation: Target Language Grammatical Gender Projections Onto Source Language

Repository contains code and data for experiments described in Mitigating Gender Bias in Machine Translation: Target Language Grammatical Gender Projections Onto Source Language

This research has been supported by the European Regional Development Fund within the joint project of SIA TILDE and University of Latvia “Multilingual Artificial Intelligence Based Human Computer Interaction” No. 1.1.1.1/18/A/148.

Requirements

Conda is recommended way to run experiments conda create -n gender-bias python=3.7.
Also make sure you have system-wide dependencies sudo apt install build-essential swig python-dev libgoogle-perftools-dev libsparsehash-dev.
Then switch into conda environment and install necessary tools via scripts/install_tools.sh.

Running experiments

Experiments are organized per language pair (training corpora).
Running bash scripts in order from scripts/{language}/*.sh will prepare data, train model and evaluate BLEU and WinoMT scores. Experiments for latvian_imba (large proprietary Tilde corpora) are not reproducible.
Each language pair trains 2 NMT systems baseline(base) with no TGA and gendered(genders2) with TGA in training data.

Evaluation results

Evaluation metrics are aggregated in evaluation_logs/{languate}/{experiment}/.
WinoMT test set translations are stored in data/wino_mt/{langage}/{experiment}.
Newstest translations can be found in data/dev_translations/{language}/{experiment}.

Scripts

Paper-specific data preparation scripts can be found in scripts/python. Example usage can be found in scripts/common/ where these scripts are invoked.

generate_genders.py extracts gender annotations (M/F/N/U) using Stanza tagger
align_genders.py projects target gender annotations onto source side tokens
genders_bpe.py copy word level gender annotations to their respective sub-word parts
randomly_include_genders.py applies dropout to TGA
wino_mt_genders.py extract gold gender annotations from WinoMT dataset
wino_mt_genders_allen.py generate gender annotations using AllenNLP coreference resolution tool

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
data		data
evaluation_logs		evaluation_logs
scripts		scripts
tools/morph-analysis		tools/morph-analysis
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mitigating Gender Bias in Machine Translation: Target Language Grammatical Gender Projections Onto Source Language

Requirements

Running experiments

Evaluation results

Scripts

About

Releases

Packages

Languages

tilde-nlp/mitigating-gender-bias-wmt-2020

Folders and files

Latest commit

History

Repository files navigation

Mitigating Gender Bias in Machine Translation: Target Language Grammatical Gender Projections Onto Source Language

Requirements

Running experiments

Evaluation results

Scripts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages