Skip to content

Latest commit

 

History

History
133 lines (99 loc) · 7.1 KB

README.md

File metadata and controls

133 lines (99 loc) · 7.1 KB

PySC2 Deep Reinforcement Learning Agents

This repository implements different Deep Reinforcement Learning Agents for the pysc2 learning environment as described in the DeepMind StarCraft II paper.

We provide implementations for:

This repository is part of a student research project which was conducted at the Autonomous Systems Labs, TU Darmstadt by Daniel Palenicek, Marcel Hussing, and Simon Meister.

The repository was originally located at simonmeister/pysc2-rl-agents but has moved to this new location.

Content

The following gives a brief explaination about what we have implemented in this repository. For more detailed information check out the reports.

FeUdal Networks

We have adapted and implemented the FeUdal Networks algorithm for hierarical reinforcement learning on StarCraft II. To be compatable with StarCraft II we account for the spatial state and action space, opposed to the original pubication on Atari.

A2C & PPO

We implemented these baseline agents to learn the PySC2 minigames. While PPO can only train a FullyConvolutional Policy in the current implementation A2C can additionally train a ConvolutionalLSTM policy.

Reports

We document our implementation and results in more depth in the following reports:

Usage

Software Requirements

  • Python 3
  • pysc2 (tested with v1.2)
  • TensorFlow (tested with 1.4.0)
  • StarCraft II and mini games (see below or pysc2)

Quick Install Guide

  • pip install numpy tensorflow-gpu pysc2==1.2
  • Install StarCraft II. On Linux, use 3.16.1. Unzip the package into the home directory.
  • Download the mini games and extract them to your ~/StarcraftII/Maps/ directory.

Train & run

Quickstart: python run.py <experiment-id> will run the training with default settings for Fully Connected A2C. To evalutate after training run python run.py <experiment-id> --eval.

The implementation enables highly configurable experiments via the command line args. To see the full documentation run python run.py --help.

The most important flags to add to the python run.py <experiment-id> command include:

  • --agent: Choose between A2C, PPO and FeUdal
  • --policy: Choose the topology of the policy network (not all agents are compatible with every network)
  • --map: Choose the mini-map which you want to train on
  • --vis: Visualize the agent

Summaries are written to out/summary/<experiment_name> and model checkpoints are written to out/models/<experiment_name>.

Hardware Requirements

For fast training, a GPU is recommended. We ran our experiments on Titan X Pascal and GTX 1080Ti GPUs

Results

On the mini games, we report the following results as best mean over score:

Map FC ConvLSTM PPO FUN DeepMind
MoveToBeacon 26 26 26 26 26
CollectMineralShards 97 93 - - 103
FindAndDefeatZerglings 45 - - - 45
DefeatRoaches - - - - 100
DefeatZerglingsAndBanelings 68 - - - 62
CollectMineralsAndGas - - - - 3978
BuildMarines - - - - 3

In the following we show plots for the score over episodes.

FeUdal Networks

PPO

A2C

Convolutional LSTM

Fully Connected

Note that the DeepMind mean scores are their best individual scores after 100 runs for each game, where the initial learning rate was randomly sampled for each run. We use a constant initial learning rate for a much smaller number of runs due to limited hardware.

License

This project is licensed under the MIT License (refer to the LICENSE file for details).

Acknowledgments

The code in rl/environment.py is based on OpenAI baselines, with adaptions from sc2aibot. Some of the code in rl/agents/a2c/runner.py is loosely based on sc2aibot. The Convolutional LSTM Cell implementation is taken from carlthome. The FeUdal Networks implementation is inspired by dmakian.