An AI BOT playing game 2048 by using reinforcement learning
- Objective: Get the highest score / max tile. i.e. Live as long as it can while maintaining good board state.
- State: An 4x4 grid with numbers of tiles in value of power of 2.
- Action: Shift board UP, DOWN, LEFT, RIGHT
- Reward: Increment of score or score with other metrics.
Dependencies
tensorflow
numpy
pyyaml
$ python3 RL2048/Game/Play.py
Play mode:
1. Keyboard (use w, a, s, d, exit with ^C or ^D)
2. Random
select:
- Keyboard mode
- Random mode
$ python3 RL2048/Learning/backward.py
- TRAIN_MODE.NORMAL: Normal training process
- Use only NN itself
- TRAIN_MODE.WITH_RANDOM
- With a little chance to move randomly
$ python3 RL2048/Report/Statistics.py
- Success Rate of Tiles
- Scores Diagram
- Loss Diagram (TODO)
- Model (ckpt):
./model
- Last game status:
training_game.yaml
- Training log:
training.log
- Statistics report:
./report/StatisticsResult.md
If you have trouble that can't find RL2048 module. (
ModuleNotFoundError: No module named 'RL2048'
)You sould make sure your workspace is in the main directory of this project. Then execute code like this.
export PYTHONPATH=$PYTHONPATH:/path/to/this/project/ReinforcementLearning2048; python3 RL2048/Learning/backward.py
Or add the following lines to every top of the codes.
import sys
sys.path.append('/path/to/this/project/ReinforcementLearning2048')
Heuristic: Artificial Intelligence: How many artifact, how many intelligence!
With a decaly probability that take control by "Teacher".
The Monte Carlo tree search algorithm
- Monotonicity
- Smoothness
- Free Tiles
- Z-shape
(Minimax search with alpha-beta pruning)
We found that Policy Gradient is not a good approach for 2048.
The main point is 2048 has a "local confort zone". That sometimes you need to take a negative action to move since the direction that you desired is invalid.
- Network is too stupid that it keep taking invalid aciton. = =
- Loss become too small and it seems that Network learned nothing in the first 100 round. -> Too small problem solved. But still learned nothing.
MCTS Policy Gradient
Random Policy Gradient
idea:
- Use Random build a history and use DQN to observe the pattern.
- Use MCTS build a experience history, then teach DQN how to play directly.
Improvement/Adjustment
- Grid preprocessing
- one-hot
- Feed a batch of status-action pair
- Loss function
- Q-Learning gamma
- Experience
- Reinforcement Learning Notes
- There is a more elegant way to store a class object in yaml format by defining it as a subclass of yaml.YAMLObject. (PyYAML Documentation -
Constructors, representers, resolvers
section)
Use Machine Learning
- tjwei/2048-NN - Max tile 16384, 94% win rate
- georgwiese/2048-rl
- nneonneo/2048-ai
- SergioIommi/DQN-2048 - with Keras
- navjindervirdee/2048-deep-reinforcement-learning - Max tile 4096, 10% win rate
Use Traditional AI
- daviddwlee84/2048-AI-BOT - This was me and my friend Tom attending AI competition in 2014.
- ovolve/2048-AI - 90% win rate
- jdleesmiller/twenty48
Simple Game Play
- Python
- JavaScript
- gabrielecirulli/2048 - almost 10k stars
- GetMIT
- Stackoverflow - What is the optimal algorithm for the game 2048?
- MIT - Deep Reinforcement Learning for 2048
- Reddit - TDL, N-Tuple Network - 97% win rate
- Stanford - AI Plays 2048
AlphaGo