This repo contains the code for reproducing the experiments in the paper titled
A Game-Theoretic Analysis of Networked System Control for Common-Pool Resource Management Using Multi-Agent Reinforcement Learning.
The code made use of networked multi-agent RL (NMARL) algorithms adapted from https://github.com/cts198859/deeprl_network. The copyright of the above repository belongs to the original authors and we do not claim ownership of any intellectual property other than that of our original contributions (see the paper).
Our analysis consists of several steps:
- Training agents.
- Evaluating individual agent's restraint.
- Tagging agents as cooperating/defecting.
- Evaluating the performance of algorithms while varying the ratio of cooperators to defectors.
- Plotting Schelling diagrams.
- Introduction
- Environment
- Algorithms
- Building the Docker Images
- Analysis Pipeline
- Original Networked MARL Code
Our environment is a form of iterated "tragedy of the commons" general sum Markov game. The environment has a shared resource pool of water from which agents acting as valve controllers can extract resources to gain rewards. The initial resource value is 0.5 and regenerates over time until depleted.
Each agent receives, as observations, a tuple
(current-resource-value, total-taken-by-agent-so-far)
.
At each time step, agents may choose to take
, or to wait
. If They decide
to wait
, they receive no reward and the resource amount does not diminish. If
instead they decide to take
, then they diminish the resource by an amount
0.1/N
where N
is the total number of agents (4 in our experiments) and
they receive a reward of the same amount. In the context of the paper, the
actions take
and wait
correspond to opening and closing shut-off valves.
At each time step, if the resource is not completely depleted, it regenerates according to the following formula: where is the regeneration rate. A regeneration rate of 0.1, would allow agents to take selfishly at every time step without any chance of the resource depleting.
The game ends when the resource is depleted or after some fixed number of steps.
The following algorithms are trained:
- IA2C (
ia2c_ind
in the code) - NA2C (
ia2c
in the code) - FPrint (
ia2c_fp
in the code) - ConseNet (
ma2c_cu
in the code) - DIAL (
ma2c_dial
in the code) - CommNet (
ma2c_ic3
in the code) - NeurComm (
ma2c_nc
in the code)
There are two docker files in this repo. The first installs tensorflow and most of the dependencies used by the code. The second installs hydra and joblib for configuration and parallel processing. The images can be built simply by running
make build
The program can then mostly be run and managed via make.
We use docker to run and manage the code and experiments; however, some additional dependencies are required:
docker
should be installed on the base systemmake
is used to run basic commandstmux
is used to launch some processes (such as tensorboard) in the background
Agents are trained for various levels of resource regeneration, with different initial seeds. Agents can be trained by executing the following command:
make regen_exp
This command executes the script regen_exp.sh which starts up a docker container and runs the main training program regen_exp.py which uses hydra joblib sweeper to launch many training runs in parallel. Specifically, this trains agents for every combination of the following parameters:
seeds="0,1,2"
algorithms="ia2c,ia2c_fp,ia2c_ind,ma2c_cu,ma2c_ic3,ma2c_dial,ma2c_nc"
regen_rates="0.03,0.042,0.053,0.065,0.077,0.088,0.1"
The number of parallel processes used for training can be set via the
num_threads
field in
config.yaml
The saved models and many individual evaluation episodes are stored in the
results/
directory.
Executing following command will process the evaluation episodes data:
make process_evaluation_data
The processed evaluation data is saved as a pickle file (results/regen_exp/tragedy/evaluation_results_data.p
)
Using the processed evaluation data, the agent's average restraint percentage can be computed. Which is averaged across the seeds and the episodes. Then the restraint heat-map can be plot by the following command:
make plot_restraint_heat_map
The agents need to be categorized as defectors or cooperators based on their average restraint percentage and the classification thresholds. The threshold can be adjusted in the experiments/classify_agent_behaviour.py
file.
The following command will categorized the agents and save the classification to the agent_tags.p
file:
make classify_agent_behaviour
Once agents are categorized, they can be played against each other. The
file agent_tags.p
in the directory results/regen_exp/tragedy/
, running the following
command will generate a sequence of evaluation runs with varying number of
defector agents.
make play_defect
This command runs the shell script play_defectors.sh
which launches many parallel evaluation runs for each algorithm. The results
of these runs are stored under
results/regen_exp/<alg>/defectors=<defectors>/
.
This data can be processed by the following command:
make process_interaction_data
Using the processed data, a Schelling diagram can be plot for each algorithm by the following command:
make plot_schelling_diagrams
In our paper, we discuss a connected neighbourhood reward weighting factor . In the code this
is referred to as coop_gamma
, as it was in the original NMARL code. This
parameter affects the level of cooperation between agents by sharing rewards.
The reward seen by each agent at each time step is the weighted combination
of its own reward with those of its neighbors, calculated as follows:
where is the adjacency list of agent i in the communication
graph. Negative values of are equivalent to . Since
affects the magnitude of the rewards seen by each agent, reward
normalization need to be set independently for each value of . The
normalization should be set so that each agent typically sees a reward of
order 1. This translates to the following:
reward_norm |
|
---|---|
0 | 0.025 |
0.1 | 0.0325 |
1 | 0.1 |
The value of can be set in
config.yaml under the name
coop_gamma
, and reward_norm
can be set in each of the config files found in
egta/config/
See the paper:
Multi-agent Reinforcement Learning for Networked System Control
and the github repo: https://github.com/cts198859/deeprl_network for the original NMARL code and more information about the algorithms used.