Skip to content
/ CORY Public

Official implementation of the NeurIPS 2024 paper CORY

License

Notifications You must be signed in to change notification settings

Harry67Hu/CORY

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CORY LOGO

Coevolving with the OtheR You

Official implementation of the NeurIPS 2024 paper Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning.

Overview

Reinforcement learning (RL) has emerged as a pivotal technique for fine-tuning large language models (LLMs) on specific tasks. However, prevailing RL fine-tuning methods predominantly rely on PPO and its variants. Though these algorithms are effective in general RL settings, they often exhibit suboptimal performance and vulnerability to distribution collapse when applied to the fine-tuning of LLMs. In this paper, we propose CORY, extending the RL fine-tuning of LLMs to a sequential cooperative multi-agent reinforcement learning framework, to leverage the inherent coevolution and emergent capabilities of multi-agent systems. In CORY, the LLM to be fine-tuned is initially duplicated into two autonomous agents: a pioneer and an observer. The pioneer generates responses based on queries, while the observer generates responses using both the queries and the pioneer’s responses. The two agents are trained together. During training, the agents exchange roles periodically, fostering cooperation and coevolution between them. Experiments evaluate CORY’s performance by fine-tuning GPT-2 and Llama-2 under subjective and objective reward functions on the IMDB Review and GSM8K datasets, respec- tively. Results show that CORY outperforms PPO in terms of policy optimality, resistance to distribution collapse, and training robustness, thereby underscoring its potential as a superior methodology for refining LLMs in real-world applications.

CORY Idea
Basic Idea of CORY

How to Setup Environment

To ensure that you can run the code in the same environment as it was developed, we recommend using the Conda environment management tool to replicate our development environment. Follow the steps below to quickly set up your environment.

1. Install Conda

If you haven't already installed Conda, please visit the Anaconda website to download and install Anaconda. Anaconda is a free and open-source distribution of Python and R programming languages for scientific computing, that aims to simplify package management and deployment. After installation, you can check if it was successful by opening a terminal or command prompt and typing the following command:

conda --version

2. Replicate Conda Environment

We have provided a trl_environment.yml file that contains all the dependencies required to run the code. Please follow the steps below to create and activate the environment:

create conda environment

conda env create -n new_env_name -f trl_environment.yml

activate the environment

conda env create -n new_env_name -f trl_environment.yml

How to run

  1. Download gpt-2 model and distilbert-imdb
  2. Run the following command:

Fine-tuning with PPO

python imdb_train/ppo.py

Fine-tuning with CORY

python imdb_train/cory.py

Citation

If you find this repository useful, please cite our paper:

@misc{ma2024coevolvingyoufinetuningllm,
      title={Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning}, 
      author={Hao Ma and Tianyi Hu and Zhiqiang Pu and Boyin Liu and Xiaolin Ai and Yanyan Liang and Min Chen},
      year={2024},
      eprint={2410.06101},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2410.06101}, 
}

Statement

We will continue to maintain this code repository in the coming months.

About

Official implementation of the NeurIPS 2024 paper CORY

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages