A distributed asynchronous actor-critic algorithm in a multi-agent setting with differentiable communication and a centralized critic.
Check out learned policies here: https://youtu.be/fB71yKcP3iU
Contains 4 environment suites:
- POC Suite: Hidden Reward, Navigation, Pursuit, Traffic Intersection
- MPE Suite: Cooperative Navigation, Cooperative Communication, Cooperative Reference, Tag
- KiloBot Suite: Light, Join, Split
- 3d Soccer Simulation Suite: Passing, Keep-Away
Also contains scripts to launch A3C3 and learn policies. Use the requirements.txt
to install your dependencies and run the scripts.
Each agent is defined by 3 networks.
The algorithm is distributed, and multiple workers update the networks.
The actor network learns a local policy.
The centralized critic evaluates the policy.
The communicator network learns a communication protocol between agents.