Coevolving with the OtheR You

Official implementation of the NeurIPS 2024 paper Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning.

Overview

Reinforcement learning (RL) has emerged as a pivotal technique for fine-tuning large language models (LLMs) on specific tasks. However, prevailing RL fine-tuning methods predominantly rely on PPO and its variants. Though these algorithms are effective in general RL settings, they often exhibit suboptimal performance and vulnerability to distribution collapse when applied to the fine-tuning of LLMs. In this paper, we propose CORY, extending the RL fine-tuning of LLMs to a sequential cooperative multi-agent reinforcement learning framework, to leverage the inherent coevolution and emergent capabilities of multi-agent systems. In CORY, the LLM to be fine-tuned is initially duplicated into two autonomous agents: a pioneer and an observer. The pioneer generates responses based on queries, while the observer generates responses using both the queries and the pioneer’s responses. The two agents are trained together. During training, the agents exchange roles periodically, fostering cooperation and coevolution between them. Experiments evaluate CORY’s performance by fine-tuning GPT-2 and Llama-2 under subjective and objective reward functions on the IMDB Review and GSM8K datasets, respec- tively. Results show that CORY outperforms PPO in terms of policy optimality, resistance to distribution collapse, and training robustness, thereby underscoring its potential as a superior methodology for refining LLMs in real-world applications.

Basic Idea of CORY

How to Setup Environment

To ensure that you can run the code in the same environment as it was developed, we recommend using the Conda environment management tool to replicate our development environment. Follow the steps below to quickly set up your environment.

1. Install Conda

If you haven't already installed Conda, please visit the Anaconda website to download and install Anaconda. Anaconda is a free and open-source distribution of Python and R programming languages for scientific computing, that aims to simplify package management and deployment. After installation, you can check if it was successful by opening a terminal or command prompt and typing the following command:

conda --version

2. Replicate Conda Environment

We have provided a trl_environment.yml file that contains all the dependencies required to run the code. Please follow the steps below to create and activate the environment:

create conda environment

conda env create -n new_env_name -f trl_environment.yml

activate the environment

conda env create -n new_env_name -f trl_environment.yml

How to run

Download gpt-2 model and distilbert-imdb
Run the following command:

Fine-tuning with PPO

python imdb_train/ppo.py

Fine-tuning with CORY

python imdb_train/cory.py

Citation

If you find this repository useful, please cite our paper:

@misc{ma2024coevolvingyoufinetuningllm,
      title={Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning}, 
      author={Hao Ma and Tianyi Hu and Zhiqiang Pu and Boyin Liu and Xiaolin Ai and Yanyan Liang and Min Chen},
      year={2024},
      eprint={2410.06101},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2410.06101}, 
}

Statement

We will continue to maintain this code repository in the coming months.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
imdb_train		imdb_train
img		img
trl		trl
utils		utils
LICENSE		LICENSE
README.md		README.md
trl_environment.yml		trl_environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coevolving with the OtheR You

Overview

How to Setup Environment

1. Install Conda

2. Replicate Conda Environment

How to run

Citation

Statement

About

Releases

Packages

Contributors 2

Languages

License

Harry67Hu/CORY

Folders and files

Latest commit

History

Repository files navigation

Coevolving with the OtheR You

Overview

How to Setup Environment

1. Install Conda

2. Replicate Conda Environment

How to run

Citation

Statement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages