The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models
Graformer (also named BridgeTransformer in the code) is a sequence-to-sequence model mainly for Neural Machine Translation. We improve the multilingual translation by taking advantage of pre-trained (masked) language models, including pre-trained encoder (BERT) and pre-trained decoder (GPT). The code is based on Fairseq.
You can start with run/run.sh, with some minor modification. The corresponding scripts represent:
train a pre-trained BERT:
run_arnold_multilingual_masked_lm_6e6d.sh
train a pre-trained GPT:
run_arnold_multilingual_lm_6e6d.sh
train a Graformer:
run_arnold_multilingual_graft_transformer_12e12d_ted.sh
inference from Graformer:
run_arnold_multilingual_graft_inference_ted.sh
We release our pre-trained mBERT and mGPT, along with the trained Graformer model in here.
We will provide the tensorflow version in Neurst, a popular toolkit for sequence processing.
Please cite as:
@inproceedings{sun2021mulilingual,
title = "Multilingual Translation via Grafting Pre-trained Language Models",
author = "Sun, Zewei and Wang, Mingxuan and Li, Lei",
booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: Findings",
year = "2021"
}
If you have any questions, please feel free to contact me: sunzewei.v@bytedance.com