Anna is an AI trained using the DQN (Deep Q-Network) algorithm to play Pacman.
python 3.5
keras
tensorflow/theano
scikit-image
h5py
GPU Training: cuda
To start training, run dqn.py
with -m Train
To test the latest agent, use -m Test
The -iw
flag enables user interactions, these interactions are also recorded as experiences while training.
The -v
flag enables verbose reporting.
Example:
python3 dqn.py -m Train -v
This file checks the environment functinoality. First a check is performed on Keras to verify it exists and retrieve the backend in use. Then, that backend is tested using a basic function and whether the CPU or GPU is being used is reported.
python3 env_test.py
## Checking Keras
Using TensorFlow backend.
## Checking TensorFlow
Version 1.5.0
Testing...
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 13656171249690873929
]
2018-02-14 18:32:51.352817: I tensorflow/core/common_runtime/direct_session.cc:297] Device mapping:
Device mapping: no known devices.
Exp: (Exp): /job:localhost/replica:0/task:0/device:CPU:0
x: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2018-02-14 18:32:51.355679: I tensorflow/core/common_runtime/placer.cc:874] Exp: (Exp)/job:localhost/replica:0/task:0/device:CPU:0
2018-02-14 18:32:51.355702: I tensorflow/core/common_runtime/placer.cc:874] x: (Const)/job:localhost/replica:0/task:0/device:CPU:0
Result [1.53210308 1.01887397 2.53835477 ... 1.09205114 1.66782705 2.41976454]
True Values [1.53210308 1.01887397 2.53835477 ... 1.09205114 1.66782705 2.41976454]
Looping 1000 times took 0.012136 seconds
TensorFlow is using the CPU
This is the DQN implementation.
FRAMEWORK = 'tf' # tf/theano
MODEL_NAME = 'Anna' # An AI has to have a name! Also the subdirectory name
MODEL_VERSION = 1 # The version is used for the file name
VERSION_UPDATE = 1200 # New version interval, in seconds
SAVE_INTERVAL = 1000 # Save interval in iterations
# It is recommended to set this higher on non-SSD hard disks as dumping is expensive
DUMPING_INTERVAL = 5000 # Experience memory dumping interval in iterations
REPORT_INTERVAL = 1000 # Frames before reporting, if not verbose
THROTTLING_PERIOD = 2 # Frames to skip before sampling the experience log and training again
ACTIONS = 6 # Number of valid actions
INITIAL_GAMMA = 0.6 # Low confidence in predictions while exploring
# A large gamma for a game where positive rewards are usually closely-packed can eventually cause an overflow
FINAL_GAMMA = 0.8 # High confidence when perfecting the technique
OBSERVE_PERIOD = 5000 # Frames to observe before training
EXPLORE_PERIOD = 1000000 # Iterations over which to anneal EPSILON from initial to final
INITIAL_EPSILON = 0.1 # Starting value of EPSILON
FINAL_EPSILON = 0.001 # Final value of EPSILON
MEMORY_SIZE = 50000 # Number of previous transitions to remember
BATCH = 32 # Size of experiences to train on
FRAMES_PER_ACTION = 1 # The delay before taking another action (set to 1 for no delay)
LEARNING_RATE = 1e-4 # Our network's learning rate
FRAMES_PER_SAMPLE = 4 # How many frames to stack per sample, good for detecting time-based amounts like velocity
RESTORE_STATE = True # Whether or not to restore a state if found
RESTORE_MEMORY = True # Whether or not to restore the experience memory if found
# The image size, images are rotated 90 degrees in a matrix so the height is rows and the width is columns
# It's better to use a square image because, otherwise, lines might get jagged or completely disappear while resizing
IMG_ROWS , IMG_COLS = 132, 132
build_model()
This function builds our Keras model.
save_model()
This function saves our model (the design and weights) along with other relevant progress information like the current iteration, the model version and our epsilon/gamma paramters.
dump_memory()
This function dumps the experience memory (states, actions, rewards) to a file.
load_model()
Loads the previously learned weights for our model.
lookup_state()
Checks for a saved state, if found loads it.
lookup_experience()
Checks for a dumped memory, if found loads it.
exercise_network()
Starts exercising the network using the game in train/test mode.
This project was inspired by the following Keras-FlappyBird project. Initially I borrowed the same model but later modified it to better suite Pacman. You can read more about it here: https://yanpanlau.github.io/2016/07/10/FlappyBird-Keras.html
I do not own the game implementation used, you can find it in the following repo: https://github.com/greyblue9/pacman-python