Skip to content
diegopliebana edited this page Jul 14, 2017 · 17 revisions

The Single-Player Learning track accepts submission of agent written in Java or Python. The Python client has been tested with Python3.5. On the server, TensorFlow has been installed for Python3.5.

This page details the methods to be implemented in the Java agent or Python agent.


For both Java and Python users

At each game tick, 3 types of serialised state observation can be requested by the agent. This section explains the legal serialised state observation types and how to set it.

Serialised state observation

The StateObservation is the observation of the current state of the game, which can be used in deciding the next action to take by the agent (see doc for planning track for detailed information), but the events are not included.

The screenshot is a png of the actual game screen without frame border and title.

Three legal serialised state observation types

  • Types.LEARNING_SSO_TYPE.JSON: serialised StateObservation without screenshot.
  • Types.LEARNING_SSO_TYPE.IMAGE: screenshot and gameScore, gameTick, gameWinner, isGameOver and availableActions.
  • Types.LEARNING_SSO_TYPE.BOTH: serialised StateObservation and screenshot.

For more details, please refer to SerializableStateObservation.

How to set the serialised state observation type

  • Java agent: for instance, your agent can set the serialised state observation type by doing

      lastSsoType = Types.LEARNING_SSO_TYPE.JSON; 
    
  • Python agent: for instance, your agent can set the serialised state observation type by doing

      self.lastSsoType = LEARNING_SSO_TYPE.JSON
    

When to set the serialised state observation type

By default, Types.LEARNING_SSO_TYPE.JSON is set for Java agent and LEARNING_SSO_TYPE.JSON for Python agent.

You can re-set the serialised state observation type when a game is stated, initialised, being played or finished (terminate normally or abort):

  • When a game is started: set the serialised state observation type in the constructor of the Agent class, thus Agent(...).
  • When a game is initialised: set the serialised state observation type in init(...).
  • When a game is being played: set the serialised state observation type in act(..).
  • When a game is finished: set the serialised state observation type in results(...).

For Java users: Agent class

A sample random agent: Agent.java.

The Agent class should inherit from utils/AbstractPlayer.java.

Constructor of the agent class

public Agent(SerializableStateObservation sso, ElapsedCpuTimer elapsedTimer){...}

The constructor receives two parameters:

  • SerializableStateObservation sso: SerializableStateObservation is the serialised StateObservation without forward model, which is a String or screenshot (.png).
  • ElapsedCpuTimer elapsedTimer: The ElapsedCpuTimer is a class that allows querying for the remaining CPU time the agent has to return an action. You can query for the number of milliseconds passed since the method was called (elapsedMillis()) or the remaining time until the timer runs out (remainingTimeMillis()). The constructor has 1 second. If remainingTimeMillis() ≤ 0, this agent is disqualified in the game being played.

Initialise the agent

public Types.ACTIONS init(SerializableStateObservation sso, ElapsedCpuTimer elapsedTimer){...}

The init method is called once after the constructor, before selecting any action to play. It receives two parameters:

  • SerializableStateObservation sso.
  • ElapsedCpuTimer elapsedTimer: (see previous section) The act has to finish in 40ms, otherwise, the NIL_ACTION will be played.

Select an action to play

public Types.ACTIONS act(SerializableStateObservation sso, ElapsedCpuTimer elapsedTimer){...}

The act method selects an action to play at every game tick. It receives two parameters:

  • SerializableStateObservation sso.
  • ElapsedCpuTimer elapsedTimer: The timer with maximal time 40ms for the whole training. The act has to finish in 40 ms, otherwise, this agent is disqualified in the game being played.

Abort the current game

The agent can abort the current game by returning the action ACTION_ESCAPE. The agent will receive the results and serialised state observation sso of the unfinished game, timer and returns the next level to play using the method int result(...).

Select the next level to play

public int result(SerializableStateObservation sso, ElapsedCpuTimer elapsedTimer) {...}

During the step 2 of training, after terminating a game and receiving the results and final game state, the agent is supposed to select the next level to play. If the return level id $$$\not\in \{0,1,2\}$$$, then a random level id $$$\in \{0,1,2\}$$$ will be passed and a new game will start. The result method receives two parameters:

  • SerializableStateObservation sso: the serialised observation of final game stat at termination.
  • ElapsedCpuTimer elapsedTimer: The global timer with maximal time 5 mins for the whole training. If there is no time left (remainingTimeMillis() ≤ 0), an extract timer with maximal time=1 second will be passed.

For Python users

A sample random agent: Agent.py.

The Agent class should inherit from utils/AbstractPlayer.py.

The interface of the Python agent remains the same as the Java agent.


You may also be interested in:

Single Player Learning Track Specifications

Test an agent

Q&A

Table of Contents:

Clone this wiki locally