Skip to content

Sample Random Search Controller

rdgain edited this page Mar 20, 2017 · 1 revision

The sample RS controller implements a Random Search method to decide the next move to make. At every game step, a (new) small population of individuals is created and evaluated, including as many individuals during the 40 ms given as possible. Each individual represents a sequence of actions, and its fitness is calculated with a heuristic that evaluates the state reached after applying all these actions. The move returned is the first action of the best individual. Note that no evolution occurs during this process.

The complete code for this agent is contained in a single class: Agent.java (although it uses a heuristic class, also in the framework: WinScoreHuristic.java). Here, we highlight some interesting parts of this agent.

First, the function that evaluates an individual, from Agent.java:

private double evaluate(Individual individual, StateHeuristic heuristic, StateObservation state) {

        ElapsedCpuTimer elapsedTimerIterationEval = new ElapsedCpuTimer();

        StateObservation st = state.copy();
        int i;
        for (i = 0; i < SIMULATION_DEPTH; i++) {
            double acum = 0, avg;
            if (! st.isGameOver()) {
                ElapsedCpuTimer elapsedTimerIteration = new ElapsedCpuTimer();
                st.advance(action_mapping.get(individual.actions[i]));
                acum += elapsedTimerIteration.elapsedMillis();
                avg = acum / (i+1);
                remaining = timer.remainingTimeMillis();
                if (remaining < 2*avg || remaining < BREAK_MS) break;
            } else {
                break;
            }
        }

        StateObservation first = st.copy();
        double value = heuristic.evaluateState(first);

        // Apply discount factor
        value *= Math.pow(DISCOUNT,i);

        individual.value = value;

        numEvals++;
        acumTimeTakenEval += (elapsedTimerIterationEval.elapsedMillis());
        avgTimeTakenEval = acumTimeTakenEval / numEvals;
        remaining = timer.remainingTimeMillis();

        return value;
    }

Here is the code for the population initialisation. As many individuals as possible are created and evaluated while enough time remains:

private void init_pop(StateObservation stateObs) {

        double remaining;

        N_ACTIONS = stateObs.getAvailableActions().size() + 1;
        action_mapping = new HashMap<>();
        int k = 0;
        for (Types.ACTIONS action : stateObs.getAvailableActions()) {
            action_mapping.put(k, action);
            k++;
        }
        action_mapping.put(k, Types.ACTIONS.ACTION_NIL);

        NUM_INDIVIDUALS = 0;

        population = new ArrayList<>();
        do {
            Individual newInd = new Individual(SIMULATION_DEPTH, N_ACTIONS, randomGenerator);
            evaluate(newInd, heuristic, stateObs);
            population.add(newInd);
            remaining = timer.remainingTimeMillis();
            NUM_INDIVIDUALS++;

        } while(remaining > avgTimeTakenEval && remaining > BREAK_MS);

        if (NUM_INDIVIDUALS > 1)
            Collections.sort(population, new Comparator<Individual>() {
                @Override
                public int compare(Individual o1, Individual o2) {
                    if (o1 == null && o2 == null) {
                        return 0;
                    }
                    if (o1 == null) {
                        return 1;
                    }
                    if (o2 == null) {
                        return -1;
                    }
                    return o1.compareTo(o2);
                }});
    }

Here is the code of the heuristic that evaluates a given state, used at the end of the function described above, from WinScoreHuristic.java. If the game was won or lost, it returns a huge positive (resp. negative) number as fitness. Otherwise, it returns the score of the game at that state.

public class WinScoreHeuristic extends StateHeuristic {

    public WinScoreHeuristic(StateObservation stateObs) {}

    //The StateObservation stateObs received is the state of the game to be evaluated.
    public double evaluateState(StateObservation stateObs) {
        boolean gameOver = stateObs.isGameOver();       //true if the game has finished.
        Types.WINNER win = stateObs.getGameWinner();    //player loses, wins, or no winner yet.
        double score = stateObs.getGameScore();

        if(gameOver && win == Types.WINNER.PLAYER_WINS)       return score + 10000000.0;  //We won, this is good.
        if(gameOver && win == Types.WINNER.PLAYER_LOSES)      return score - 10000000.0; //We lost, this is bad.

        return score; //Neither won or lost, let's get the score and use it as fitness. 
    }
}

There are several customizable parameters:

  • Individual length
  • Discount factor

Table of Contents:

Clone this wiki locally