Playing Atari Pong With Reinforcement Learning

This is the PyTorch implementation of deef reinforcement learning algorithm to play Atari Pong game using OpenAI Gym.

Setup

Insall the requirements:

pip install -r requirements.txt

Usage

python3 pong/train.py

Description

State

A state in reinforcement learning is the observation that the agent receives from the environment.

Policy

A policy is the mapping from the perceived states of the environment to the actions to be taken when in those states.

Reward Signal

A reward signal is the goal in reinforcement learning. The agent tries to maximize the total reward in long run.

Value Function

The reward signal indicates what is good in immediate sense, whereas the value function measures what is good in long run. Each state of environment is assigned a value which is the total amount of reward an agent is expected to receive, starting from that state.

Model

A model in reinforcemnt learning mimics the behavior of the environment.

Deep Q-Learning Training Process

Target Network: A copy of policy network.
Initialize the Replay Memory: Used for storing the experience SARS'(state, action, reward, next-state).
For each episode:

Reset the environment to get starting state.
Calculate exploration rate.
For each time step:
1. Select an action using exploration or exploitation.
2. Take the action, get reward from the envionment and move to the next-state.
3. Store SARS'(state, action, reward, next-state) in the replay memory.
4. Sample a batch of data (SARS') from replay memory.
5. Preprocess the sampled batch of states.
6. Pass the sampled batch of states through policy network to calculate the q-values.
7. Calculate the q-values for next-states using target network.
8. Calculate: Expected q-values = reward + next-states-q-values * gamma.
9. Calculate the loss beteen q-values of policy network and expected q-values.
10. Update the weights of policy network to minimize the loss.
After 'u' episodes, update the weights of target network using the weights of policy network.

Todo List

Fixing the local optimum problem.
Calculating the moving average of scores.
Plotting the scores using TensorBoard.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
pong		pong
.gitignore		.gitignore
README.md		README.md
config.ini		config.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Playing Atari Pong With Reinforcement Learning

Setup

Usage

Description

State

Policy

Reward Signal

Value Function

Model

Deep Q-Learning Training Process

Todo List

About

Releases

Packages

Languages

abdulqadirs/atari-pong-reinforcement-learning

Folders and files

Latest commit

History

Repository files navigation

Playing Atari Pong With Reinforcement Learning

Setup

Usage

Description

State

Policy

Reward Signal

Value Function

Model

Deep Q-Learning Training Process

Todo List

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages