|
1 |
| -# GraphPRM |
| 1 | +# GraphPRM: Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners |
2 | 2 |
|
3 |
| -Code and data for KDD 2025 Research Track Anonymous Submission: "Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners" |
| 3 | +<div align="left"> |
| 4 | + <p> |
| 5 | + <a href='https://arxiv.org/abs/2503.00845'><img src='https://img.shields.io/badge/arXiv-2503.00845-b31b1b'></a> |
| 6 | + <a href='https://huggingface.co/datasets/GraphPRM/GraphSilo'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-GraphSilo-blue'></a> |
| 7 | + <a href='https://huggingface.co/GraphPRM/GraphPRM-7B'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-GraphPRM-purple'></a> |
| 8 | + <a href='https://github.com/GKNL/GraphPRM'><img src='https://img.shields.io/badge/GitHub-GraphPRM-green'></a> |
| 9 | + </p> |
| 10 | +</div> |
4 | 11 |
|
5 |
| -## Dataset and Model Weight Link |
| 12 | +**GraphPRM** is the first Process Reward Model tailored for graph reasoning tasks, which further enhancing LLMs' mathematical reasoning capabilities on other reasoning domains, including mathematical problem-solving tasks. We also developed **GraphSilo**, the largest dataset for graph reasoning with fine-grained CoT solutions, with 118,189 samples and 394,165 step-wise labels. |
6 | 13 |
|
7 |
| -**Full dataset can also be accessed at:** [GraphSilo](https://huggingface.co/datasets/GraphPRM/GraphSilo), [GraphSilo-Test](https://huggingface.co/datasets/GraphPRM/GraphSilo-Test) (Anonymous Repository) |
| 14 | +This repository contains the code and data for training and evaluating GraphPRM models, along with the full GraphSilo dataset. Please check our [paper](https://arxiv.org/abs/2503.00845) for more details. |
8 | 15 |
|
9 |
| -**Full GraphPRM model weight can be accessed at:** [GraphPRM-1.5B](https://huggingface.co/GraphPRM/GraphPRM-1.5B), [GraphPRM-7B](https://huggingface.co/GraphPRM/GraphPRM-7B) (Anonymous Repository) |
| 16 | +<p align="center"> |
| 17 | + <img src="image/overview.jpg" width="800px"/> |
| 18 | +</p> |
10 | 19 |
|
11 |
| -## Key File Descriptions |
| 20 | +## 💫 News |
12 | 21 |
|
13 |
| -### `data/` |
| 22 | +- **[2025.05.15]** GraphPRM is accepted to **KDD 2025 Research Track**. 🔥🔥🔥 |
| 23 | +- **[2025.02.15]** Initial release of 🤗[GraphSilo](https://huggingface.co/datasets/GraphPRM/GraphSilo) dataset and 🤗[GraphPRM](https://huggingface.co/GraphPRM/GraphPRM-7B) models. 🚀🚀🚀 |
14 | 24 |
|
15 |
| -- `GraphSilo/`: Training set for GraphPRM model (containing step-wise labels from "Task-oriented Trajectories" and "Monte Carlo Estimation"). |
| 25 | +## 📊 Dataset and Models |
16 | 26 |
|
17 |
| -- `GraphSilo_test/`: Test set of 13 graph tasks in GraphSilo. |
18 |
| - - `[graph_task].jsonl`: Test samples for corresponding graph tasks. |
19 |
| - - `GraphSilo_test_in_domain.jsonl`: Test samples for 10 in-domain graph tasks (that used to train GraphPRM): Degree, Clustering Coefficient, Jaccard, Common Connectivity, Diameter, Page Rank, MST, Maximum Flow, Predecessor. |
20 |
| - - `GraphSilo_test_out_domain.jsonl`: Test samples for 3 out-domain graph tasks (that not used to train GraphPRM): BFS, Neighbor, Cycle. |
21 |
| - - `GraphSilo_test.jsonl`: All test samples including 13 graph tasks. |
| 27 | +The full GraphSilo dataset and GraphPRM models can be accessed at: |
22 | 28 |
|
23 |
| -### `prm/` |
| 29 | +- **GraphSilo Dataset**: [GraphSilo](https://huggingface.co/datasets/GraphPRM/GraphSilo), [GraphSilo-Test](https://huggingface.co/datasets/GraphPRM/GraphSilo-Test) |
| 30 | +- **GraphPRM Models**: [GraphPRM-1.5B](https://huggingface.co/GraphPRM/GraphPRM-1.5B), [GraphPRM-7B](https://huggingface.co/GraphPRM/GraphPRM-7B) |
24 | 31 |
|
25 |
| -- `code/finetune_qwen_SFT.py`: Codes for SFT training GraphPRM with step-wise labels from GraphSilo. |
26 |
| -- `config/deepspeed_config_stage3.json`: Configuration for deepspeed stage3 training. |
| 32 | +## 📦 Installation |
27 | 33 |
|
28 |
| -### `reason/` |
29 |
| - |
30 |
| -- `llm_service/create_service_graph.sh`: Script to start LM and RM services. |
31 |
| - |
32 |
| -### `scripts/` |
33 |
| - |
34 |
| -- `eval/best_of_N.sh`: Perform inference-time computation via Best-of-N strategy with GraphPRM. |
35 |
| -- `eval/beam_search.sh`: Perform inference-time computation via Beam Search strategy with GraphPRM. |
36 |
| - |
37 |
| -## Usage Instructions |
38 |
| - |
39 |
| -### Installation |
40 |
| - |
41 |
| -``` |
| 34 | +```bash |
42 | 35 | conda create -n GraphPRM python=3.10
|
43 | 36 | conda activate GraphPRM
|
44 | 37 | pip install -r requirements.txt
|
45 |
| -pip3 install "fschat[model_worker,webui]" |
| 38 | +pip3 install "fschat[model_worker,webui]" |
46 | 39 | pip install -U pydantic
|
47 | 40 | cd envs/MATH/latex2sympy
|
48 | 41 | pip install -e .
|
49 | 42 | cd -
|
50 | 43 | ```
|
51 | 44 |
|
| 45 | +## 🛠️ Usage |
| 46 | + |
52 | 47 | ### Download Models
|
53 | 48 |
|
54 | 49 | Before running the project, please ensure that all required base models are downloaded to directory `hugging_cache`.
|
55 | 50 |
|
56 |
| -1. Download base LLM models: `Qwen2.5-1.5B-Instruct, Qwen2.5-7B-Instruct, Qwen2.5-Math-7B-Instruct, LLaMA3.1-8B-Instruct, Gemma2-9B-Instruct` |
57 |
| -2. Download GraphPRM models: `GraphPRM-7B` |
58 |
| - |
59 |
| -To download these models, please refer to the [Hugging Face model downloading tutorial](https://huggingface.co/docs/hub/models-downloading) for step-by-step guidance on downloading models from the Hugging Face Hub. |
60 |
| - |
61 | 51 | ### Start LM & RM Services
|
62 | 52 |
|
63 |
| -Before running inference, please modify the following variables in the script at `reason/llm_service/create_service.sh` to set the appropriate base models: |
64 |
| - |
65 |
| -- `$MODEL_BASE`: Set this to the directory where the models are stored. |
66 |
| -- `$POLICY_MODEL_NAME`: Set this to the name of the policy model. |
67 |
| -- `$VALUE_MODEL_NAME`: Set this to the name of the graph reward model. |
68 |
| -- `$NUM_LM_WORKER`: Set this to the number of language model (LM) workers to start. |
69 |
| -- `$NUM_RM_WORKER`: Set this to the number of reward model (RM) workers to start. |
| 53 | +1. Modify the following variables in `reason/llm_service/create_service.sh`: |
| 54 | + - `$MODEL_BASE`: Directory where models are stored |
| 55 | + - `$POLICY_MODEL_NAME`: Name of the policy model |
| 56 | + - `$VALUE_MODEL_NAME`: Name of the graph reward model |
| 57 | + - `$NUM_LM_WORKER`: Number of language model workers |
| 58 | + - `$NUM_RM_WORKER`: Number of reward model workers |
70 | 59 |
|
71 |
| -Then it prepares and runs inference using different techniques. |
72 |
| - |
73 |
| -For example, to start the LM and RM services for scaling inference-time computing with GraphPRM, run the following command: |
| 60 | +2. Start the services: |
74 | 61 | ```bash
|
75 | 62 | sh reason/llm_service/create_service.sh
|
76 | 63 | ```
|
77 | 64 |
|
78 |
| -To kill the server processes, recommend using the following command: |
| 65 | +3. To stop the services: |
79 | 66 | ```bash
|
80 | 67 | tmux kill-session -t {Your Session Name} # default is `GraphPRM`
|
81 | 68 | ```
|
82 | 69 |
|
83 |
| -### Run GraphPRM Self-supervised Finetuning |
| 70 | +### Training GraphPRM |
| 71 | + |
84 | 72 | ```bash
|
85 | 73 | cd prm/code
|
86 | 74 |
|
87 |
| -CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 finetune_qwen_SFT.py |
88 |
| - --model_path $YOUR_MODEL_PATH \ |
89 |
| - --data_path $YOUR_DATA_FOLDER_PATH |
| 75 | +CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 finetune_qwen_SFT.py \ |
| 76 | + --model_path $YOUR_MODEL_PATH \ |
| 77 | + --data_path $YOUR_DATA_FOLDER_PATH |
90 | 78 | ```
|
91 | 79 |
|
92 |
| -### Perform Inference-time Computation with GraphPRM |
| 80 | +### Inference Methods |
93 | 81 |
|
94 |
| -#### Best-of-N |
| 82 | +#### Best-of-N Strategy |
95 | 83 | ```bash
|
96 | 84 | export PYTHONPATH=$(pwd)
|
97 |
| - |
98 | 85 | sh scripts/eval/cot_rerank.sh
|
99 | 86 |
|
100 | 87 | # Key parameters:
|
101 |
| -# --LM Qwen2.5-7B-Instruct # The name of Policy Model |
102 |
| -# --RM GraphPRM-7B # The name of Reward Model |
103 |
| -# --temperature 0.7 # The temperature hyper-parameter during generation |
104 |
| -# --num_sequence 8 # The number of generated samples during generation |
105 |
| -# --max_new_tokens 2048 # Max new token number during generation |
106 |
| -# --test_set_path dataset/GraphSilo_test.jsonl # The path to test data file |
107 |
| - |
| 88 | +# --LM Qwen2.5-7B-Instruct # Policy Model name |
| 89 | +# --RM GraphPRM-7B # Reward Model name |
| 90 | +# --temperature 0.7 # Generation temperature |
| 91 | +# --num_sequence 8 # Number of generated samples |
| 92 | +# --max_new_tokens 2048 # Max new tokens |
| 93 | +# --test_set_path dataset/GraphSilo_test.jsonl # Test data path |
108 | 94 | ```
|
109 | 95 |
|
110 |
| -#### Beam Search |
| 96 | +#### Beam Search Strategy |
111 | 97 | ```bash
|
112 | 98 | export PYTHONPATH=$(pwd)
|
113 |
| - |
114 | 99 | sh scripts/eval/beam_search.sh
|
115 | 100 |
|
116 | 101 | # Key parameters:
|
117 |
| -# --LM Qwen2.5-7B-Instruct # The name of Policy Model |
118 |
| -# --RM GraphPRM-7B # The name of Reward Model |
119 |
| -# --temperature 0.7 # The temperature hyper-parameter during generation |
120 |
| -# --num_sequence 2 # The number of samples to remain per step |
121 |
| -# --tree_max_width 4 # The number of generated samples per step during generation |
122 |
| -# --tree_max_depth 50 # Max step number |
123 |
| -# --max_new_tokens 2048 # Max new token number during generation |
124 |
| -# --test_set_path dataset/GraphSilo_test.jsonl # The path to test data file |
| 102 | +# --LM Qwen2.5-7B-Instruct # Policy Model name |
| 103 | +# --RM GraphPRM-7B # Reward Model name |
| 104 | +# --temperature 0.7 # Generation temperature |
| 105 | +# --num_sequence 2 # Samples per step |
| 106 | +# --tree_max_width 4 # Generated samples per step |
| 107 | +# --tree_max_depth 50 # Max steps |
| 108 | +# --max_new_tokens 2048 # Max new tokens |
| 109 | +# --test_set_path dataset/GraphSilo_test.jsonl # Test data path |
| 110 | +``` |
| 111 | + |
| 112 | +## 📁 Project Structure |
| 113 | + |
| 114 | +``` |
| 115 | +GraphPRM/ |
| 116 | +├── data/ |
| 117 | +│ ├── GraphSilo/ |
| 118 | +│ │ ├── train.jsonl |
| 119 | +│ │ └── step_wise_labels.jsonl |
| 120 | +│ └── GraphSilo_test/ |
| 121 | +│ ├── in_domain/ |
| 122 | +│ │ ├── degree.jsonl |
| 123 | +│ │ ├── clustering_coefficient.jsonl |
| 124 | +│ │ ├── jaccard.jsonl |
| 125 | +│ │ └── ... |
| 126 | +│ └── out_domain/ |
| 127 | +│ ├── bfs.jsonl |
| 128 | +│ ├── neighbor.jsonl |
| 129 | +│ └── cycle.jsonl |
| 130 | +├── prm/ |
| 131 | +│ ├── code/ |
| 132 | +│ │ └── finetune_qwen_SFT.py |
| 133 | +│ └── config/ |
| 134 | +│ └── deepspeed_config_stage3.json |
| 135 | +├── reason/ |
| 136 | +│ └── llm_service/ |
| 137 | +│ └── create_service_graph.sh |
| 138 | +└── scripts/ |
| 139 | + └── eval/ |
| 140 | + ├── best_of_N.sh |
| 141 | + └── beam_search.sh |
| 142 | +``` |
| 143 | + |
| 144 | +### Key Components |
| 145 | + |
| 146 | +- **data/**: Contains the GraphSilo dataset |
| 147 | + - `GraphSilo/`: Training set with step-wise reasoning trajectories |
| 148 | + - `GraphSilo_test/`: Test set for 13 graph tasks |
| 149 | + - In-domain tasks (10): Degree, Clustering Coefficient, Jaccard, etc. |
| 150 | + - Out-domain tasks (3): BFS, Neighbor, Cycle |
125 | 151 |
|
| 152 | +- **prm/**: Process Reward Modeling related code |
| 153 | + - `code/`: SFT training code |
| 154 | + - `config/`: DeepSpeed configuration files for training |
| 155 | + |
| 156 | +- **reason/**: Reasoning service implementation |
| 157 | + - `llm_service/`: Service startup and management scripts |
| 158 | + |
| 159 | +- **scripts/**: Evaluation and utility scripts |
| 160 | + - `eval/`: Inference scripts for different strategies |
| 161 | + |
| 162 | +## Acknowledge |
| 163 | +Some code implementations are built upon [OpenR](https://github.com/openreasoner/openr) Repository. We sincerely appreciate the efforts for their contributions. |
| 164 | + |
| 165 | +## 📜 Citation |
| 166 | + |
| 167 | +If you find GraphPRM useful for your research and applications, please kindly cite using this BibTeX: |
| 168 | + |
| 169 | +``` |
| 170 | +@misc{graphprm, |
| 171 | + title={Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners}, |
| 172 | + author={Miao Peng and Nuo Chen and Zongrui Suo and Jia Li}, |
| 173 | + year={2025}, |
| 174 | + eprint={2503.00845}, |
| 175 | + archivePrefix={arXiv}, |
| 176 | + primaryClass={cs.CL}, |
| 177 | + url={https://arxiv.org/abs/2503.00845}, |
| 178 | +} |
126 | 179 | ```
|
0 commit comments