GitHub

Question Translation Training for Better Multilingual Reasoning

⛰️ Overview

This repository shares the code and models of our latest work on multilingual reasoning. In this work, we present a novel X-English question alignment finetuning step which performs targeted language alignment for best use of the LLMs English reasoning abilities.
Utilizing this library, you can finetune open-source LLMs into strong multilingual reasoning systems. For example, our fine-tuned LLaMA2-7B/13B achieves superior multilingual performance, significantly outperforming baseline models of equivalent size.
Overall, our method effectively reduces the performance disparity of LLMs across English and non-English languages, showing a new paradigm to unlock LLM’s capabilities to accompolish multilingual tasks.

📈 Benchmarks

Below we present LLMs' average answer accuracy (zero-shot) on multilingual reasoning benchmarks. With question alignment, our fine-tuned LLM surpasses the unaligned counterpart and the translate-training baseline (MathOctopus) by a large margin.

Our model has been open-sourced on Huggingface.

System (13B)	Monolingual Supervision	Multilingual Supervision	mGSM	mSVAMP
QAlign (ours)	MetaMathQA	-	57.1	62.6
MetaMath	MetaMathQA	-	43.9	51.8
MathOctopus	-	GSM8KInstruct	45.8	46.5
WizardMath	GSM8K & MATH	-	28.3	35.7
MAmmoTh	MathInstruct	-	28.9	38.6
RFT	GSM8k-ScRel	-	29.5	37.1
SFT	GSM8K	-	29.7	38.1

System (7B)	Monolingual Supervision	Multilingual Supervision	mGSM	mSVAMP
QAlign (ours)	MetaMathQA	-	49.6	57.2
MetaMath	MetaMathQA	-	38.4	46.2
MathOctopus	-	GSM8KInstruct	40.0	44.1
WizardMath	GSM8K & MATH	-	23.0	32.5
MAmmoTh	MathInstruct	-	21.3	26.3
RFT	GSM8k-ScRel	-	20.6	31.3
SFT	GSM8K	-	22.6	30.9

📂 Dataset

In the table below, we list datasets that are used in this project. All datasets are available within this repository, with the exception of MetaMathQA. To use MetaMathQA, please download the file MetaMathQA-395K.json with the provided link and place it in the ./data/metamath directory.

Dataset	Usage	Size	Languages	Path
MetaMathQA	Training	395,000	En	./data/metamath
GSM8KInstruct	Training	73,559	En, Bn, Th, Sw, Ja, Zh, De, Fr, Ru, Es	./data/gsm8kinstruct
mGSM	Evaluation	2,500	En, Bn, Th, Sw, Ja, Zh, De, Fr, Ru, Es	./evaluate/scripts/data/mgsm
mSVAMP	Evaluation	10,000	En, Bn, Th, Sw, Ja, Zh, De, Fr, Ru, Es	./evaluate/scripts/data/msvamp

🧩 Installation

To install this repository, follow these steps:

git clone git@github.com:NJUNLP/QAlign.git
cd QAlign
pip install --editable ./

For detailed information about the conda environment, refer to the environment.yaml file.

🛠️ Training

We develope our training pipeline based on the stanford_alpaca repository.

To perform question alignment and response alignment on pre-trained LLMs, use the following command. Please note that you must replace $PROJECT_PATH with the appropriate paths in finetune.sh or finetune_dp.sh to ensure it is executable. When fine-tuning the 13B model, we utilize DeepSpeed to save memory. You can find our deepspeed configuration in the ./config/ds.json file.

finetuning LLaMA2-7B

bash ./training_scripts/finetune_llama2_7B.sh

finetuning LLaMA2-13B

bash ./training_scripts/finetune_llama2_13B.sh

📏 Evaluation

We use the evaluation code provided by Chen et al., which meansures answer accuracy by comparing the last numerical number that appears in the LLM-generated response with the gold answer.

To evaluate the model on mGSM and mSVAMP dataset, use the following command. Please note that you must replace $PROJECT_PATH and $MODEL_PATH with the appropriate paths in the script to ensure it is executable.

evaluating with mGSM

cd evaluate/scripts

bash evaluate_mgsm.sh

evaluating with mSVAMP

cd evaluate/scripts

bash evaluate_msvamp.sh

🌲 Citation

If you find this repository helpful, feel free to cite our paper:

@misc{zhu2024question,
      title={Question Translation Training for Better Multilingual Reasoning}, 
      author={Wenhao Zhu and Shujian Huang and Fei Yuan and Shuaijie She and Jiajun Chen and Alexandra Birch},
      year={2024},
      eprint={2401.07817},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
data		data
debug		debug
evaluate		evaluate
training_scripts		training_scripts
README.md		README.md
app.py		app.py
embedding.py		embedding.py
environment.yml		environment.yml
finetune.py		finetune.py
illustration.png		illustration.png
requirements.txt		requirements.txt
utils.py		utils.py
weight_diff.py		weight_diff.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Question Translation Training for Better Multilingual Reasoning

⛰️ Overview

📈 Benchmarks

📂 Dataset

🧩 Installation

🛠️ Training

📏 Evaluation

🌲 Citation

About

Releases

Packages

Languages

NJUNLP/QAlign

Folders and files

Latest commit

History

Repository files navigation

Question Translation Training for Better Multilingual Reasoning

⛰️ Overview

📈 Benchmarks

📂 Dataset

🧩 Installation

🛠️ Training

📏 Evaluation

🌲 Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages