Skip to content

daviddwlee84/ReinforcementLearning2048

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Reinforcement Learning 2048

An AI BOT playing game 2048 by using reinforcement learning

Overview

Demo

The Elements of 2048 Reinforcement Learning problem

  • Objective: Get the highest score / max tile. i.e. Live as long as it can while maintaining good board state.
  • State: An 4x4 grid with numbers of tiles in value of power of 2.
  • Action: Shift board UP, DOWN, LEFT, RIGHT
  • Reward: Increment of score or score with other metrics.

Usage

Dependencies

  • tensorflow
  • numpy
  • pyyaml

Basic Game Play

$ python3 RL2048/Game/Play.py
Play mode:
1. Keyboard (use w, a, s, d, exit with ^C or ^D)
2. Random

 select:
  • Keyboard mode
  • Random mode

Training model

$ python3 RL2048/Learning/backward.py
  • TRAIN_MODE.NORMAL: Normal training process
    • Use only NN itself
  • TRAIN_MODE.WITH_RANDOM
    • With a little chance to move randomly

Statistics Report

$ python3 RL2048/Report/Statistics.py
  • Success Rate of Tiles
  • Scores Diagram
  • Loss Diagram (TODO)

Default file locations

  • Model (ckpt): ./model
  • Last game status: training_game.yaml
  • Training log: training.log
  • Statistics report: ./report/StatisticsResult.md

If you have trouble that can't find RL2048 module. (ModuleNotFoundError: No module named 'RL2048')

You sould make sure your workspace is in the main directory of this project. Then execute code like this.

export PYTHONPATH=$PYTHONPATH:/path/to/this/project/ReinforcementLearning2048; python3 RL2048/Learning/backward.py

Or add the following lines to every top of the codes.

import sys
sys.path.append('/path/to/this/project/ReinforcementLearning2048')

Policy Gradient

Heuristic: Artificial Intelligence: How many artifact, how many intelligence!

Epsilon Decay

With a decaly probability that take control by "Teacher".

Random

Traditonal Tree-search algorithm

The Monte Carlo tree search algorithm

  • Monotonicity
  • Smoothness
  • Free Tiles
  • Z-shape

(Minimax search with alpha-beta pruning)

Result of Policy Gradient

We found that Policy Gradient is not a good approach for 2048.

The main point is 2048 has a "local confort zone". That sometimes you need to take a negative action to move since the direction that you desired is invalid.

Problems

  • Network is too stupid that it keep taking invalid aciton. = =
  • Loss become too small and it seems that Network learned nothing in the first 100 round. -> Too small problem solved. But still learned nothing.

MCTS Policy Gradient

MCTS Policy Gradient

Random Policy Gradient

Random Policy Gradient

idea:

  • Use Random build a history and use DQN to observe the pattern.
  • Use MCTS build a experience history, then teach DQN how to play directly.

Deep Q-Learning (DQN)

Improvement/Adjustment

  1. Grid preprocessing
    • one-hot
  2. Feed a batch of status-action pair
  3. Loss function
  4. Q-Learning gamma
  5. Experience

Notes

Links

Similar Project

Use Machine Learning

Use Traditional AI

Simple Game Play

Article and Paper

AlphaGo

Others

About

An AI BOT playing game 2048 by using reinforcement learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages