Skip to content

Experimenting with Policy Gradient Reinforcement Learning Algorithms and Evolutionary Strategies in a warehouse congestion-management domain.

License

Notifications You must be signed in to change notification settings

StavrosOrf/MultiAgentLearning

Repository files navigation

Analyzing the Performance Impact of Combining Agent Factorization with Different Learning Algorithms for Multiagent Coordination.

This project implements various algorithms on a warehouse domain using multiple agent definitions with each learning algorithm, for purpose of comparing it to CCEA.

It contains implementations of the following algorithms on the warehouse framework

  • ES
  • AdamES
  • Canonical ES
  • (MA)DDPG
  • I-DQN

And it contains the following agent definitions:

  • Centralized
  • Intersection
  • Link
  • Centralized with last time
  • Intersection with last time
  • Link with last time
  • Centralized with avg time
  • Intersection with avg time
  • Link with avg time
  • Centralized with both time
  • Intersection with both time
  • Link with both time

The Analysis of the results can be found in our paper.

More detailed analysis can be found in our extended paper.

Installation

Linux

Install system dependencies (on ubuntu):

sudo apt install libboost-dev libyaml-cpp-dev

In the working folder of your choice, clone the project code

git clone git@github.com:stavrosgreece/multiagentlearning.git

install CTorch/LibTorch (C++ version of PyTorch):

cd multiagentlearning
wget https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.12.1%2Bcpu.zip
unzip libtorch-shared-with-deps-1.10.1+cpu.zip
rm libtorch-shared-with-deps-1.10.1+cpu.zip

Note: you can download and install LibTorch with different computer platform (e.g. CUDA/RocM) on https://pytorch.org/, be sure to instal the c++11+ ABI version

Build the code:

cd multiagentlearning
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release .. && make

Running preconfigured projects

Run the project configured by config.yaml using six threads:

./testWarehouse -c ../config.yaml -t 6

Debug

cmake -DCMAKE_BUILD_TYPE=Debug .. && make && gdb --args ./testWarehouse -c ../config.yaml
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo .. && make && valgrind --track-origins=yes --leak-check=full --show-leak-kinds=all ./testWarehouse -c ../config.yaml

Profiling

With Perf (Recommended if you are not made out RAM)

cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo .. && make && perf record -g ./testWarehouse -c ../config.yaml

With Callgrind

cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo .. && make && valgrind --tool=callgrind ./testWarehouse -c ../config.yaml

Authors

This is repo is a fork of https://github.com/JenJenChung/multiagent_learning

Cite

In order the orignal paper:

    @inproceedings{10.1145/3549737.3549773,
    author = {Kallinteris, Andreas and Orfanoudakis, Stavros and Chalkiadakis, Georgios},
    title = {The Performance Impact of Combining Agent Factorization with Different Learning Algorithms for Multiagent Coordination},
    year = {2022},
    isbn = {9781450395977},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3549737.3549773},
    doi = {10.1145/3549737.3549773},
    abstract = {Factorizing a multiagent system refers to partitioning the state-action space to individual agents and defining the interactions between those agents. This so-called agent factorization is of much importance in real-world industrial settings, and is a process that can have significant performance implications. In this work, we explore if the performance impact of agent factorization is different when using different learning algorithms in multiagent coordination settings. We evaluated six different agent factorization instances—or agent definitions—in the warehouse traffic management domain, comparing the performance of (mainly) two learning algorithms suitable for learning coordinated multiagent policies: the Evolutionary Strategies (ES), and a genetic algorithm (CCEA) previously used in this setting. Our results demonstrate that different learning algorithms are affected in different ways by alternative agent definitions. Given this, we can deduce that many important multiagent coordination problems can potentially be solved by an appropriate agent factorization in conjunction with an appropriate choice of a learning algorithm. Moreover, our work shows that ES is an effective learning algorithm for the warehouse traffic management domain; while, interestingly, celebrated policy gradient methods do not fare well in this complex real-world problem setting.},
    booktitle = {Proceedings of the 12th Hellenic Conference on Artificial Intelligence},
    articleno = {32},
    numpages = {10},
    keywords = {Agent Factorization, Multiagent Coordination, Evolutionary Strategies, Warehouse Traffic Management},
    location = {Corfu, Greece},
    series = {SETN '22}
    }

In order the extended paper:

@article{Kallinteris2024ACA,
	title = {A comprehensive analysis of agent factorization and learning algorithms in multiagent systems},
	volume = {38},
	issn = {1573-7454},
	url = {https://doi.org/10.1007/s10458-024-09662-9},
	doi = {10.1007/s10458-024-09662-9},
	number = {2},
	journal = {Autonomous Agents and Multi-Agent Systems},
	author = {Kallinteris, Andreas and Orfanoudakis, Stavros and Chalkiadakis, Georgios},
	month = jun,
	year = {2024},
	pages = {27},
}

Funding

The research described in this paper was carried out within the framework of the National Recovery and Resilience Plan Greece 2.0, funded by the European Union - NextGenerationEU (Implementation Body: ΕΛΙΔΕΚ - HFRI Project name: DEEP- REBAYES HFRI Project Number 15430).

About

Experimenting with Policy Gradient Reinforcement Learning Algorithms and Evolutionary Strategies in a warehouse congestion-management domain.

Topics

Resources

License

Stars

Watchers

Forks