Status: Stable release
This repo contains the following two main parts: a disaster resource allocation game and a collection of algorithms to solve this game.
on supply chain management: base-stock policy
References (some quite old literature):
Optimal policies for a multi-echelon inventory problem
Lower bounds for multi-echelon stochastic inventory systems
Stock positioning and performance estimation in serial production-transportation systems.
Newsvendor bounds and heuristic for optimal policies in serial supply chains.
We experiment with 2 variants:
- A2C
- DQN
- Double DQN
- Dueling DQN
- DQN with Prioritized Experience Replay
- PPO
References:
TOM2C: TARGET-ORIENTED MULTI-AGENT COMMUNICATION AND COOPERATION WITH THEORY OF MIND
References:
Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts
Trade-off:
- Rollouts too short → accurate opponent models not fully utilized → low sample efficiency.
- Rollouts too long → inaccurate opponent models depart the rollouts from the real trajectory distribution heavily → degraded performance in the environment and low sample efficiency.
References:
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
Camel Multi-Agent Role-Playing Framework
UI based on Crafter: Open world survival game for evaluating a wide range of agent abilities within a single environment.
-
Research challenges:
-
Meaningful evaluation:
python3 -m pip install crafter # Install Crafter
python3 -m pip install pygame # Needed for human interface
python3 -m crafter.run_gui # Start the game
To install Crafter, refer to the description in their repo.
Agents are allowed a budget of 1M environmnent steps and are evaluated by their
success rates of the 22 achievements and by their geometric mean score. Example
scripts for computing these are included in the analysis
directory of the
repository.
-
Reward: The sparse reward is
+1
for unlocking an achievement during the episode and-0.1
or+0.1
for lost or regenerated health points. Results should be reported not as reward but as success rates and score. -
Success rates: The success rates of the 22 achievemnts are computed as the percentage across all training episodes in which the achievement was unlocked, allowing insights into the ability spectrum of an agent.
-
Crafter score: The score is the geometric mean of success rates, so that improvements on difficult achievements contribute more than improvements on achievements with already high success rates.
Please create a pull request if you would like to add your or another algorithm to the scoreboards. For the reinforcement learning and unsupervised agents categories, the interaction budget is 1M. The external knowledge category is defined more broadly.
Please fill in this google form.
Please [open an issue][issues] on Github.
[issues]: