-
Notifications
You must be signed in to change notification settings - Fork 48
Sample Random Search Controller (2 Player)
The 2 player version of the sample RS controller implements a Random Search method to decide the next move to make. At every game step, one (new) small population of individuals is created and evaluated, including as many individuals during the 40 ms given as possible. Each individual represents a sequence of actions, and its fitness is calculated with a heuristic that evaluates the state reached after applying all these actions. The move returned is the first action of the best individual. Note that no evolution occurs during this process.
The complete code for this agent is contained in a single class: Agent.java (although it uses a heuristic class, also in the framework: WinScoreHuristic.java). Here, we highlight some interesting parts of this agent.
First, the function that evaluates an individual, from Agent.java. Note that an array of actions must be supplied to the advance function. The opponent model is random:
private double evaluate(Individual individual, StateHeuristicMulti heuristic, StateObservationMulti state) {
ElapsedCpuTimer elapsedTimerIterationEval = new ElapsedCpuTimer();
StateObservationMulti st = state.copy();
int i;
for (i = 0; i < SIMULATION_DEPTH; i++) {
double acum = 0, avg;
if (! st.isGameOver()) {
ElapsedCpuTimer elapsedTimerIteration = new ElapsedCpuTimer();
// Multi player advance method
Types.ACTIONS[] advanceActs = new Types.ACTIONS[noPlayers];
for (int k = 0; k < noPlayers; k++) {
if (k == playerID)
advanceActs[k] = action_mapping[k].get(individual.actions[i]);
else advanceActs[k] = action_mapping[k].get(randomGenerator.nextInt(N_ACTIONS[k]));
}
st.advance(advanceActs);
acum += elapsedTimerIteration.elapsedMillis();
avg = acum / (i+1);
remaining = timer.remainingTimeMillis();
if (remaining < 2*avg || remaining < BREAK_MS) break;
} else {
break;
}
}
StateObservationMulti first = st.copy();
double value = heuristic.evaluateState(first, playerID);
// Apply discount factor
value *= Math.pow(DISCOUNT,i);
individual.value = value;
numEvals++;
acumTimeTakenEval += (elapsedTimerIterationEval.elapsedMillis());
avgTimeTakenEval = acumTimeTakenEval / numEvals;
remaining = timer.remainingTimeMillis();
return value;
}
Here is the code for the population initialisation. As many individuals as possible are created and evaluated while enough time remains:
private void init_pop(StateObservationMulti stateObs) {
double remaining;
N_ACTIONS = new int[noPlayers];
action_mapping = new HashMap[noPlayers];
for (int i = 0; i < noPlayers; i++) {
ArrayList<Types.ACTIONS> actions = stateObs.getAvailableActions(i);
N_ACTIONS[i] = actions.size() + 1;
action_mapping[i] = new HashMap<>();
int k = 0;
for (Types.ACTIONS action : actions) {
action_mapping[i].put(k, action);
k++;
}
action_mapping[i].put(k, Types.ACTIONS.ACTION_NIL);
}
NUM_INDIVIDUALS = 0;
population = new ArrayList<>();
do {
Individual newInd = new Individual(SIMULATION_DEPTH, N_ACTIONS[playerID], randomGenerator);
evaluate(newInd, heuristic, stateObs);
population.add(newInd);
remaining = timer.remainingTimeMillis();
NUM_INDIVIDUALS++;
} while(remaining > avgTimeTakenEval && remaining > BREAK_MS);
if (NUM_INDIVIDUALS > 1)
Collections.sort(population, new Comparator<Individual>() {
@Override
public int compare(Individual o1, Individual o2) {
if (o1 == null && o2 == null) {
return 0;
}
if (o1 == null) {
return 1;
}
if (o2 == null) {
return -1;
}
return o1.compareTo(o2);
}});
}
Here is the code of the heuristic that evaluates a given state, used at the end of the function described above, from WinScoreHuristic.java. If the game was won or lost, it returns a huge positive (resp. negative) number as fitness. Otherwise, it returns the score of the game at that state.
public class WinScoreHeuristic extends StateHeuristicMulti {
private static final double HUGE_NEGATIVE = -1000.0;
private static final double HUGE_POSITIVE = 1000.0;
public WinScoreHeuristic(StateObservationMulti stateObs) {
}
public double evaluateState(StateObservationMulti stateObs, int playerID) {
boolean gameOver = stateObs.isGameOver();
Types.WINNER win = stateObs.getMultiGameWinner()[playerID];
Types.WINNER oppWin = stateObs.getMultiGameWinner()[(playerID + 1) % stateObs.getNoPlayers()];
double rawScore = stateObs.getGameScore(playerID);
if(gameOver && (win == Types.WINNER.PLAYER_LOSES || oppWin == Types.WINNER.PLAYER_WINS))
return HUGE_NEGATIVE;
if(gameOver && (win == Types.WINNER.PLAYER_WINS || oppWin == Types.WINNER.PLAYER_LOSES))
return HUGE_POSITIVE;
return rawScore;
}
}
There are several customizable parameters:
- Individual length
- Discount factor
-
GVG Framework
- Tracks Description
- Code Structure
- Creating Controllers
- Creating Multi Player Controllers
- Creating Level Generators
- Running & Testing Level Generators
- Creating Rule Generators
- Running & Testing Rule Generators
-
Forward Model and State Observation
- Advancing and copying the state
- Advancing and copying the state (2 Player)
- Querying the state of the game
- Querying the state of the game (2 Player)
- Information about the state of the Avatar
- Information about the state of the Avatar (2 Player)
- Information about events happened in the game
- Information about other sprites in the game
- Game Description Class
- Constraints
- Game Analyzer Class
- Level Analyzer Class
- Sprite Level Description Class
- Sprite, Termination, and Interaction Data Class
- Level Mapping Class
- Competition Specifications
- VGDL Language