- Authors: Sahar Abdelnabi, Amr Gomaa, Sarath Sivaprasad, Lea Schönherr, Mario Fritz
There is an growing interest in using Large Language Models (LLMs) in multi-agent systems to tackle interactive real-world tasks that require effective collaboration and assessing complex situations. Yet, we still have a limited understanding of LLMs' communication and decision-making abilities in multi-agent setups. The fundamental task of negotiation spans many key features of communication, such as cooperation, competition, and manipulation potentials. Thus, we propose using scorable negotiation to evaluate LLMs. We create a testbed of complex multi-agent, multi-issue, and semantically rich negotiation games. To reach an agreement, agents must have strong arithmetic, inference, exploration, and planning capabilities while integrating them in a dynamic and multi-turn setup. We propose multiple metrics to rigorously quantify agents' performance and alignment with the assigned role. We provide procedures to create new games and increase games' difficulty to have an evolving benchmark. Importantly, we evaluate critical safety aspects such as the interaction dynamics between agents influenced by greedy and adversarial players. Our benchmark is highly challenging; GPT-3.5 and small models mostly fail, and GPT-4 and SoTA large models (e.g., Llama-3 70b) still underperform in adversarial, noisy, and more competitive games.
- You can find here an example of one of our runs with GPT-4
- All games and game variants developed in the paper
- All logs from experiments in the paper
- Code to run simulations
- Evaluation code
- Guide on how to adjust the experiments and extend the setup for other scenarios.
- Setup
- Games
- Setting the game and simulation configuration
- Guide on how the prompts are organized
- Running the simulation
- Evaluation
- Logs
- Citation
- Create a new enviroment and install the following:
conda install pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install conda-forge::transformers
pip install google-cloud-aiplatform
pip install openai
pip install accelerate
-
All games can be found under `games_descriptions. Current games are:
Base
gameBase rewritten
: base game rewritten by GPT-4- 7-player and 6-issue variant extended from base game
- New games created by prompting GPT-4 and manual curation (
game1
,game2
,game3
)
-
Games are organized as follows:
-
global_instructions.txt
:- These are global instructions about project and issues given to all agents.
- Name of agents should be put between quotations
""
and be consistent all along the global instructions (will be parsed when creating initial prompts, more details about that later)
-
<GAME>/individual_instructions
:- There should be sub-folders that correspond to the
incentive
of the player (e.g., cooperative). I.e.,<GAME>/individual_instructions/<INCENTIVE>
- Under the sub-folders, there should be files that correspond to players.
- These are the confidential information given to agents about their preferences in addition to any agent-specific instructions about their incentives (e.g.,
you need to maximize this option as much as possible
) - The files contain the scores as placeholders that will be populated when forming the initial prompt (more details about that later).
- There should be sub-folders that correspond to the
-
<GAME>/scores_files
:- These are files that contain the scores of players for each issue
- Each line is a comma-separated line of scores per issue
- Line 1 corresponds to issue A, etc.
- The last line is the minimum threshold for the agent.
- If you just want to change scores or thresholds without affecting the overall game and priorities, just change values in the scores files without changing the other files.
-
<GAME>/initial_deal.txt
: This is the most preferred option forp1
that will be used to start the negotiation.
-
-
We include
greedy
,targeted_adv
,untargeted_adv
,cooperative
incentives for thebase
game according to the results in the paper. Other games have currently only thecooperative
variant. -
If you would like to support another incentive, create a new sub-directory and write the individual instructions for agents you would like to combine that incentive with.
- Change
<GAME>/config.txt
to run customized combinations of agents' models, incentives, etc and varying number of agents, etc. - Each line in
config.txt
corresponds to one agent. - Each line should be organized as
<AGENT NAME>, <FILE NAME>, <ROLE>, <INCENTIVE>, <MODEL>
:<AGENT NAME>
this is the agent's game name as written in theglobal_instructions.txt
file.<FILE NAME>
this is the agent's file name under<GAME>/individual_instructions
and<GAME>/scores_files
<ROLE>
a specific role for the agent. At the moment this can bep1
,p2
,target
(for the target agent intargeted_adv
incentive), orplayer
(default for all others).<INCENTIVE>
the incentive for the agent. This can begreedy
,targeted_adv
,untargeted_adv
, orcooperative
. A sub-directory of the same name must be included under<GAME>/individual_instructions
.- the model that will be used for this agent. You can specify different models for different agents. The code now supports GPT models via Azure APIs or OpenAI APIs, Gemini, or Hugging Face models. For Hugging Face models, write
hf_<MODEL>
.
- If you would like to run the same game but with fewer agents, remove that agent's line from the config file.
- The agents have 1) initial prompts and 2) round prompts. They are formed as follows:
1- initial prompts
initial_prompts.txt
first reads the global instructions and replaces the agent's name with (<AGENT_NAME> (represented by you
).
self.global_instructions = self.load_global_instructions(os.path.join(game_description_dir,'global_instructions.txt'))
- Next, scores and indiviual instructions are read and combined:
individual_scores_file = os.path.join(game_description_dir,'scores_files', agent_file_name+'.txt')
self.scores = self.load_scores(individual_scores_file)
individual_instructions_file = os.path.join(game_description_dir,'individual_instructions',incentive, agent_file_name+'.txt')
self.individual_instructions = self.load_individual_instructions(individual_instructions_file)
- The initial prompt contains scoring and voting instructions that are given the same to all agents (E.g., who is
p1
, the round schema, etc.). - There also specific incentive instructions (e.g., for cooperative, it includes something like
any deal with a score higher than your minimum threshold is preferable to you than no deal. You are very open to any compromise to achieve that
). - The final initial prompt is:
final_initial_prompt = self.global_instructions + '\n' + self.individual_instructions + scoring_rules + voting_rules + incentive_rules
InitialPrompt
class supports changing the number of agents and the number of classes and it takesp1
andp2
frommain.py
(more details later). It also call the incentive-specific functions based on the agents' incentives defined in theconfig.txt
files.
2- round prompts
- The rounds' prompts get appended to the initial prompts at each interaction.
- Round prompts are constructed as:
slot_prompt = history_prompt + scratch_pad + unified_instructions + plan_prompt
history_prompt
is the n-window history of the negotiation of public answers. They are formatted such that the agent's name is replaced byYou:
.scratch_pad
is instructions on the individual CoT steps along with incentive-related instructions of the goals. Currently, eachincentive
has a scratch pad function that gets called based on the agent's incentive.unified_instructions
are instructions on how to format answers.plan_prompt
are instructions on how to form plans (won't be called for the last time the agent is prompted).
- If you would like to support another incentive, create new functions for that incentive in initial and round prompts if needed.
- After changing
config.txt
, run the simulation as:
python main.py --exp_name <OUTPUT_DIR> --agents_num <NUM> --issues_num <NUM> --window_size <NUM> --game_dir ./games_descriptions/<GAME> --rounds_num <NUM>
- If you need to run Azure APIs, run with the flag `--azure``
- Specify API keys
- Change the number of agents and issues according to the game.
- We used
rounds_num
as (4*agents_num
) - The training script will create an output dir with
exp_name
under./games_descriptions/<GAME>
. It will also copyconfig.txt
and<GAME>/scores_files
- The history file will have the following format:
history['content']["rounds"].append({'agent':agent_name, 'prompt': prompt, 'full_answer': full_answer, 'public_answer': public_answer})
rounds
is a list of length (args.rounds_num
+ 2). The first is the initial prompts and the last one is the deal suggestion byp1
.prompt
is the prompt given at this round.full_answer
is the full answer including the CoT.public_answer
is the extracted public answer given to agents in the history.
1- evaluation/evaluate_deals.ipynb
:
- Measures metrics: any success rate, final success rate, and ratio of wrong scores. Change the following according to the game:
HOME = '<HOME>' OUTPUT_DIR = os.path.join(HOME,'LLM-Deliberation/games_descriptions/base/output/all_coop') AGENTS_NUM = 6 ISSUES_NUM = 5 NUM_ROUNDS = 24
- Use the same notebook to create figures of agents' deals such as the ones in the paper:
2- evaluation/score_leakage.py
:
- Use GPT-4 as a judge to evaluate whether scores where leaked in the public answers.
- Specify the following arguments:
MAX_THREADS = 60
parser = argparse.ArgumentParser(
prog='Verifier')
parser.add_argument('--azure_openai_api', default='', help='azure api')
parser.add_argument('--azure_openai_endpoint', default='', help='azure endpoint')
parser.add_argument('--model_name', default='', help='azure model')
parser.add_argument('--exp_dir')
args, _ = parser.parse_known_args()
os.environ["AZURE_OPENAI_API_KEY"] = args.azure_openai_api
os.environ["AZURE_OPENAI_ENDPOINT"] = args.azure_openai_endpoint
- Note that this script creates parallel calls to GPT-4. Be mindful of cost as it may accumulate quickly.
3- evaluation/adjust_games.ipynb
- This script can be used to visualize the number of possible deals (and also possible deals per agent) after changing the scores or minimum thresholds of agents.
- Change the following parameters:
HOME = '/HOME/'
GAME_DIR = os.path.join(HOME,'LLM-Deliberation/games_descriptions/base/')
AGENTS_NUM = 6
ISSUES_NUM = 5
- We share logs of most of our experiments under
logs
. - Please note that some logs were generated with our previous code base and the logs were saved in a slightly different scheme. Please refer to
old_code
branch (we will add more details TBD).
If you find our paper, dataset, or this repo helpful, please cite our paper:
@inproceedings{
abdelnabi2024negotiation,
title={Cooperation, Competition, and Maliciousness: {LLM}-Stakeholders Interactive Negotiation},
author={Sahar Abdelnabi and Amr Gomaa and Sarath Sivaprasad and Lea Schönherr and Mario Fritz},
booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2024},
}