Skip to content

Deep reinforcement learning using a deep Q-network with a dueling architecture.

Notifications You must be signed in to change notification settings

andreimuntean/Deep-Q-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Q-Learning

Deep reinforcement learning using a deep Q-network with a dueling architecture written in TensorFlow.

This AI does not rely on hand-engineered rules or features. Instead, it masters the environment by looking at raw pixels and learning from experience, just as humans do.

Dependencies

  • NumPy
  • OpenAI Gym 0.8
  • Pillow
  • SciPy
  • TensorFlow 1.0

Learning Environment

Uses environments provided by OpenAI Gym.

Preprocessing

Each frame is transformed into a 48×48×3 image with 32-bit float values between 0 and 1. No image cropping is performed. Reward signals are restricted to -1, 0 and 1.

Network Architecture

The input layer consists of a 48×48×3 image.

The first hidden layer convolves 64 filters of size 4×4 and stride 2, followed by a rectifier nonlinearity.

The second hidden layer convolves 64 filters of size 3×3 and stride 2, followed by another rectifier nonlinearity.

The third hidden layer convolves 64 filters of size 3×3 and stride 1, followed by another rectifier nonlinearity.

When using a dueling architecture, the network diverges into two streams – one computes the advantage of each possible action, the other the state value.

  • The advantage stream consists of a fully-connected layer with 512 rectified linear units, feeding into as many output nodes as there are actions.

  • The state value stream consists of a fully-connected layer with 512 rectified linear units, feeding into a single output node.

  • The two streams merge and form the output layer. Each output node represents the expected utility of an action.

If a dueling architecture is not used:

  • The last hidden layer consists of a fully-connected layer with 512 rectified linear units.
  • The output layer has as many nodes as there are actions. Each output node represents the expected utility of an action.

Acknowledgements

Heavily influenced by DeepMind's seminal paper 'Playing Atari with Deep Reinforcement Learning' (Mnih et al., 2013) and 'Human-level control through deep reinforcement learning' (Mnih et al., 2015).

Uses double Q-learning as described in Deep Reinforcement Learning with Double Q-learning.

Uses the dueling architecture described in Dueling Network Architectures for Deep Reinforcement Learning.

About

Deep reinforcement learning using a deep Q-network with a dueling architecture.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages