Skip to content

Commit c9a5c1d

Browse files
committed
update
1 parent 7668daa commit c9a5c1d

File tree

2 files changed

+124
-71
lines changed

2 files changed

+124
-71
lines changed

README.md

Lines changed: 124 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -1,126 +1,179 @@
1-
# GraphPRM
1+
# GraphPRM: Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners
22

3-
Code and data for KDD 2025 Research Track Anonymous Submission: "Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners"
3+
<div align="left">
4+
<p>
5+
<a href='https://arxiv.org/abs/2503.00845'><img src='https://img.shields.io/badge/arXiv-2503.00845-b31b1b'></a>
6+
<a href='https://huggingface.co/datasets/GraphPRM/GraphSilo'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-GraphSilo-blue'></a>
7+
<a href='https://huggingface.co/GraphPRM/GraphPRM-7B'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-GraphPRM-purple'></a>
8+
<a href='https://github.com/GKNL/GraphPRM'><img src='https://img.shields.io/badge/GitHub-GraphPRM-green'></a>
9+
</p>
10+
</div>
411

5-
## Dataset and Model Weight Link
12+
**GraphPRM** is the first Process Reward Model tailored for graph reasoning tasks, which further enhancing LLMs' mathematical reasoning capabilities on other reasoning domains, including mathematical problem-solving tasks. We also developed **GraphSilo**, the largest dataset for graph reasoning with fine-grained CoT solutions, with 118,189 samples and 394,165 step-wise labels.
613

7-
**Full dataset can also be accessed at:** [GraphSilo](https://huggingface.co/datasets/GraphPRM/GraphSilo), [GraphSilo-Test](https://huggingface.co/datasets/GraphPRM/GraphSilo-Test) (Anonymous Repository)
14+
This repository contains the code and data for training and evaluating GraphPRM models, along with the full GraphSilo dataset. Please check our [paper](https://arxiv.org/abs/2503.00845) for more details.
815

9-
**Full GraphPRM model weight can be accessed at:** [GraphPRM-1.5B](https://huggingface.co/GraphPRM/GraphPRM-1.5B), [GraphPRM-7B](https://huggingface.co/GraphPRM/GraphPRM-7B) (Anonymous Repository)
16+
<p align="center">
17+
<img src="image/overview.jpg" width="800px"/>
18+
</p>
1019

11-
## Key File Descriptions
20+
## 💫 News
1221

13-
### `data/`
22+
- **[2025.05.15]** GraphPRM is accepted to **KDD 2025 Research Track**. 🔥🔥🔥
23+
- **[2025.02.15]** Initial release of 🤗[GraphSilo](https://huggingface.co/datasets/GraphPRM/GraphSilo) dataset and 🤗[GraphPRM](https://huggingface.co/GraphPRM/GraphPRM-7B) models. 🚀🚀🚀
1424

15-
- `GraphSilo/`: Training set for GraphPRM model (containing step-wise labels from "Task-oriented Trajectories" and "Monte Carlo Estimation").
25+
## 📊 Dataset and Models
1626

17-
- `GraphSilo_test/`: Test set of 13 graph tasks in GraphSilo.
18-
- `[graph_task].jsonl`: Test samples for corresponding graph tasks.
19-
- `GraphSilo_test_in_domain.jsonl`: Test samples for 10 in-domain graph tasks (that used to train GraphPRM): Degree, Clustering Coefficient, Jaccard, Common Connectivity, Diameter, Page Rank, MST, Maximum Flow, Predecessor.
20-
- `GraphSilo_test_out_domain.jsonl`: Test samples for 3 out-domain graph tasks (that not used to train GraphPRM): BFS, Neighbor, Cycle.
21-
- `GraphSilo_test.jsonl`: All test samples including 13 graph tasks.
27+
The full GraphSilo dataset and GraphPRM models can be accessed at:
2228

23-
### `prm/`
29+
- **GraphSilo Dataset**: [GraphSilo](https://huggingface.co/datasets/GraphPRM/GraphSilo), [GraphSilo-Test](https://huggingface.co/datasets/GraphPRM/GraphSilo-Test)
30+
- **GraphPRM Models**: [GraphPRM-1.5B](https://huggingface.co/GraphPRM/GraphPRM-1.5B), [GraphPRM-7B](https://huggingface.co/GraphPRM/GraphPRM-7B)
2431

25-
- `code/finetune_qwen_SFT.py`: Codes for SFT training GraphPRM with step-wise labels from GraphSilo.
26-
- `config/deepspeed_config_stage3.json`: Configuration for deepspeed stage3 training.
32+
## 📦 Installation
2733

28-
### `reason/`
29-
30-
- `llm_service/create_service_graph.sh`: Script to start LM and RM services.
31-
32-
### `scripts/`
33-
34-
- `eval/best_of_N.sh`: Perform inference-time computation via Best-of-N strategy with GraphPRM.
35-
- `eval/beam_search.sh`: Perform inference-time computation via Beam Search strategy with GraphPRM.
36-
37-
## Usage Instructions
38-
39-
### Installation
40-
41-
```
34+
```bash
4235
conda create -n GraphPRM python=3.10
4336
conda activate GraphPRM
4437
pip install -r requirements.txt
45-
pip3 install "fschat[model_worker,webui]"
38+
pip3 install "fschat[model_worker,webui]"
4639
pip install -U pydantic
4740
cd envs/MATH/latex2sympy
4841
pip install -e .
4942
cd -
5043
```
5144

45+
## 🛠️ Usage
46+
5247
### Download Models
5348

5449
Before running the project, please ensure that all required base models are downloaded to directory `hugging_cache`.
5550

56-
1. Download base LLM models: `Qwen2.5-1.5B-Instruct, Qwen2.5-7B-Instruct, Qwen2.5-Math-7B-Instruct, LLaMA3.1-8B-Instruct, Gemma2-9B-Instruct`
57-
2. Download GraphPRM models: `GraphPRM-7B`
58-
59-
To download these models, please refer to the [Hugging Face model downloading tutorial](https://huggingface.co/docs/hub/models-downloading) for step-by-step guidance on downloading models from the Hugging Face Hub.
60-
6151
### Start LM & RM Services
6252

63-
Before running inference, please modify the following variables in the script at `reason/llm_service/create_service.sh` to set the appropriate base models:
64-
65-
- `$MODEL_BASE`: Set this to the directory where the models are stored.
66-
- `$POLICY_MODEL_NAME`: Set this to the name of the policy model.
67-
- `$VALUE_MODEL_NAME`: Set this to the name of the graph reward model.
68-
- `$NUM_LM_WORKER`: Set this to the number of language model (LM) workers to start.
69-
- `$NUM_RM_WORKER`: Set this to the number of reward model (RM) workers to start.
53+
1. Modify the following variables in `reason/llm_service/create_service.sh`:
54+
- `$MODEL_BASE`: Directory where models are stored
55+
- `$POLICY_MODEL_NAME`: Name of the policy model
56+
- `$VALUE_MODEL_NAME`: Name of the graph reward model
57+
- `$NUM_LM_WORKER`: Number of language model workers
58+
- `$NUM_RM_WORKER`: Number of reward model workers
7059

71-
Then it prepares and runs inference using different techniques.
72-
73-
For example, to start the LM and RM services for scaling inference-time computing with GraphPRM, run the following command:
60+
2. Start the services:
7461
```bash
7562
sh reason/llm_service/create_service.sh
7663
```
7764

78-
To kill the server processes, recommend using the following command:
65+
3. To stop the services:
7966
```bash
8067
tmux kill-session -t {Your Session Name} # default is `GraphPRM`
8168
```
8269

83-
### Run GraphPRM Self-supervised Finetuning
70+
### Training GraphPRM
71+
8472
```bash
8573
cd prm/code
8674

87-
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 finetune_qwen_SFT.py
88-
--model_path $YOUR_MODEL_PATH \
89-
--data_path $YOUR_DATA_FOLDER_PATH
75+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 finetune_qwen_SFT.py \
76+
--model_path $YOUR_MODEL_PATH \
77+
--data_path $YOUR_DATA_FOLDER_PATH
9078
```
9179

92-
### Perform Inference-time Computation with GraphPRM
80+
### Inference Methods
9381

94-
#### Best-of-N
82+
#### Best-of-N Strategy
9583
```bash
9684
export PYTHONPATH=$(pwd)
97-
9885
sh scripts/eval/cot_rerank.sh
9986

10087
# Key parameters:
101-
# --LM Qwen2.5-7B-Instruct # The name of Policy Model
102-
# --RM GraphPRM-7B # The name of Reward Model
103-
# --temperature 0.7 # The temperature hyper-parameter during generation
104-
# --num_sequence 8 # The number of generated samples during generation
105-
# --max_new_tokens 2048 # Max new token number during generation
106-
# --test_set_path dataset/GraphSilo_test.jsonl # The path to test data file
107-
88+
# --LM Qwen2.5-7B-Instruct # Policy Model name
89+
# --RM GraphPRM-7B # Reward Model name
90+
# --temperature 0.7 # Generation temperature
91+
# --num_sequence 8 # Number of generated samples
92+
# --max_new_tokens 2048 # Max new tokens
93+
# --test_set_path dataset/GraphSilo_test.jsonl # Test data path
10894
```
10995

110-
#### Beam Search
96+
#### Beam Search Strategy
11197
```bash
11298
export PYTHONPATH=$(pwd)
113-
11499
sh scripts/eval/beam_search.sh
115100

116101
# Key parameters:
117-
# --LM Qwen2.5-7B-Instruct # The name of Policy Model
118-
# --RM GraphPRM-7B # The name of Reward Model
119-
# --temperature 0.7 # The temperature hyper-parameter during generation
120-
# --num_sequence 2 # The number of samples to remain per step
121-
# --tree_max_width 4 # The number of generated samples per step during generation
122-
# --tree_max_depth 50 # Max step number
123-
# --max_new_tokens 2048 # Max new token number during generation
124-
# --test_set_path dataset/GraphSilo_test.jsonl # The path to test data file
102+
# --LM Qwen2.5-7B-Instruct # Policy Model name
103+
# --RM GraphPRM-7B # Reward Model name
104+
# --temperature 0.7 # Generation temperature
105+
# --num_sequence 2 # Samples per step
106+
# --tree_max_width 4 # Generated samples per step
107+
# --tree_max_depth 50 # Max steps
108+
# --max_new_tokens 2048 # Max new tokens
109+
# --test_set_path dataset/GraphSilo_test.jsonl # Test data path
110+
```
111+
112+
## 📁 Project Structure
113+
114+
```
115+
GraphPRM/
116+
├── data/
117+
│ ├── GraphSilo/
118+
│ │ ├── train.jsonl
119+
│ │ └── step_wise_labels.jsonl
120+
│ └── GraphSilo_test/
121+
│ ├── in_domain/
122+
│ │ ├── degree.jsonl
123+
│ │ ├── clustering_coefficient.jsonl
124+
│ │ ├── jaccard.jsonl
125+
│ │ └── ...
126+
│ └── out_domain/
127+
│ ├── bfs.jsonl
128+
│ ├── neighbor.jsonl
129+
│ └── cycle.jsonl
130+
├── prm/
131+
│ ├── code/
132+
│ │ └── finetune_qwen_SFT.py
133+
│ └── config/
134+
│ └── deepspeed_config_stage3.json
135+
├── reason/
136+
│ └── llm_service/
137+
│ └── create_service_graph.sh
138+
└── scripts/
139+
└── eval/
140+
├── best_of_N.sh
141+
└── beam_search.sh
142+
```
143+
144+
### Key Components
145+
146+
- **data/**: Contains the GraphSilo dataset
147+
- `GraphSilo/`: Training set with step-wise reasoning trajectories
148+
- `GraphSilo_test/`: Test set for 13 graph tasks
149+
- In-domain tasks (10): Degree, Clustering Coefficient, Jaccard, etc.
150+
- Out-domain tasks (3): BFS, Neighbor, Cycle
125151

152+
- **prm/**: Process Reward Modeling related code
153+
- `code/`: SFT training code
154+
- `config/`: DeepSpeed configuration files for training
155+
156+
- **reason/**: Reasoning service implementation
157+
- `llm_service/`: Service startup and management scripts
158+
159+
- **scripts/**: Evaluation and utility scripts
160+
- `eval/`: Inference scripts for different strategies
161+
162+
## Acknowledge
163+
Some code implementations are built upon [OpenR](https://github.com/openreasoner/openr) Repository. We sincerely appreciate the efforts for their contributions.
164+
165+
## 📜 Citation
166+
167+
If you find GraphPRM useful for your research and applications, please kindly cite using this BibTeX:
168+
169+
```
170+
@misc{graphprm,
171+
title={Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners},
172+
author={Miao Peng and Nuo Chen and Zongrui Suo and Jia Li},
173+
year={2025},
174+
eprint={2503.00845},
175+
archivePrefix={arXiv},
176+
primaryClass={cs.CL},
177+
url={https://arxiv.org/abs/2503.00845},
178+
}
126179
```

image/overview.jpg

1.91 MB
Loading

0 commit comments

Comments
 (0)