Name		Name	Last commit message	Last commit date
parent directory ..
font		font
graphics		graphics
imgs		imgs
sound		sound
README.md		README.md
agent.py		agent.py
fruit.py		fruit.py
settings.py		settings.py
snake.py		snake.py
snake_game.py		snake_game.py

README.md

The Main Game

This file contains code for the actual game built using Pygame as well as the Tabular RL agent.

Basic Game

The original implementation can be found here. The code has been refactored for the purposes of this project. Separate classes have been used for the Snake Agent, Environment and Reward.

The game can be played through the following command:

python3 snake_game.py

Mastering the Game using RL

State Space

My representation uses 7 bits of information to describe the current state of the snake:

4 bits of information to define the relative position of the fruit with respect to the head of the snake
3 bits of information for obstacles right in front of the head, to the immediate right and left of the head

An alternative state space representation could utilise 8 bits, treating the up, down, left and right directions separately.

Action Space

The snake has 3 possible actions:

Do nothing: The snake continues to move in the same direction
Turn right: The snake turns right to change its direction
Turn left: The snake turns left to change its direction

An alternative action space could consist of 4 actions - up, down, left, right.

Reward Scheme

I have used a fairly simple reward scheme that can be optimized to improve the performance of the agent:

Reward of +5 if the snake moves closer to the fruit
Reward of -5 if the snake moves away from the fruit
Reward of +500 for eating the fruit
Reward of -1000 for crashing

Hyperparameters

The starting learning rate and ε parameter for an ε-greedy policy are 0.5 and 0.01. Without decaying these hyperparameters, the training behaviour of the agent is extremely erratic. With annealing, the performance is more consistent. The agent has achieved a maximum score of 64.

The python file agent.py accepts the following command-line arguments:

--algorithm followed by the algorithm to train the agent. The options include Sarsa, Q-Learning, Expected-Sarsa, Dyna-Q, compare. Compare plots the relative performance of the algorithms
--episodes followed by the number of episodes to train the agent for

Example Usage:

python3 agent.py --algorithm q-learning --episodes 500

The training phase is run at a higher fps. Every 10 episodes, the game is slowed down to observe the progress of the agent. This is also used to plot the average reward obtained by the agent.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

main-game

main-game

README.md

The Main Game

Basic Game

Mastering the Game using RL

State Space

Action Space

Reward Scheme

Hyperparameters

Files

main-game

Directory actions

More options

Directory actions

More options

Latest commit

History

main-game

Folders and files

parent directory

README.md

The Main Game

Basic Game

Mastering the Game using RL

State Space

Action Space

Reward Scheme

Hyperparameters