🔥DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning
Chengpeng Li, Guanting Dong, Mingfeng Xue, Ru Peng, Xiang Wang, Dayiheng Liu
University of Science and Technology of China
Qwen, Alibaba Inc.
📃 ArXiv Paper • 🤗 Dataset (Huggingface) • 📚 Dataset (Google drive)
If you find this work helpful for your research, please kindly cite it.
@article{li2024dotamath,
author = {Chengpeng Li and
Guanting Dong and
Mingfeng Xue and
Ru Peng and
Xiang Wang and
Dayiheng Liu},
title = {DotaMath: Decomposition of Thought with Code Assistance and Self-correction
for Mathematical Reasoning},
journal = {CoRR},
volume = {abs/2407.04078},
year = {2024},
url = {https://doi.org/10.48550/arXiv.2407.04078},
doi = {10.48550/ARXIV.2407.04078},
eprinttype = {arXiv},
eprint = {2407.04078},
timestamp = {Wed, 07 Aug 2024 21:29:45 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2407-04078.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
- [12/2024] 🔥 We released our DotaMathQA dataset! Download 🤗 DotaMathQA (huggingface) or 📚 DotaMathQA (google drive) here.
- [07/2024] 🔥 We introduce DotaMath, a series of LLMs that employs the Decomposition of thought with code assistance and self-correction for mathematical reasoning. Check out the paper.
Large language models (LLMs) have made significant strides in solving simple math problems but still struggle with complex tasks. This paper presents DotaMath, a series of LLMs that utilize thought decomposition, code assistance, and self-correction for mathematical reasoning. DotaMath tackles complex problems by breaking them down into simpler subtasks, using code to solve these subtasks, receiving detailed feedback from the code interpreter, and engaging in self-reflection. By annotating diverse interactive tool-use trajectories and applying query evolution on the GSM8K and MATH datasets, we create an instruction fine-tuning dataset called DotaMathQA, consisting of 574K query-response pairs. We train several base LLMs using imitation learning on DotaMathQA, resulting in models that outperform open-source LLMs on various benchmarks. Notably, DotaMath-deepseek-7B achieves 64.8% on the MATH dataset and 86.7% on GSM8K, maintaining strong competitiveness across multiple benchmarks (Avg. 80.1%). We believe the DotaMath paradigm will pave the way for tackling intricate mathematical problems.