-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
01b8983
commit a7b4843
Showing
1 changed file
with
15 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,21 @@ | ||
# Use DQN to play Atari Games (single run) | ||
# Atari General | ||
|
||
Use random policy, the sample speed is \~3000 env step per second (\~12000 Atari frame per second in fact since we use frame_stack=4). The main bottleneck is training the convolutional neural network. | ||
Using random policy, the sample speed is \~3000 env step per second (\~12000 Atari frame per second in fact since we use frame_stack=4). The main bottleneck is training the convolutional neural network. | ||
|
||
The env wrapper is a crucial thing. Without wrappers, the agent cannot perform well enough on Atari games. Many existing RL codebases use [OpenAI wrapper](https://github.com/openai/baselines/blob/master/baselines/common/atari_wrappers.py), but it is not the original DeepMind version ([related issue](https://github.com/openai/baselines/issues/240)). Dopamine has a different [wrapper](https://github.com/google/dopamine/blob/master/dopamine/discrete_domains/atari_lib.py) but unfortunately it cannot work very well in our codebase. | ||
|
||
# DQN (single run) | ||
|
||
| task | best reward | reward curve | parameters | time cost | | ||
| --------------------------- | ----------- | ------------------------------------- | ------------------------------------------------------------ | ------------------- | | ||
| PongNoFrameskip-v4 | 20 |  | `python3 atari_dqn.py` | ~30 min (~15 epoch) | | ||
| BreakoutNoFrameskip-v4 | 316 |  | `python3 atari_dqn.py --task "BreakoutNoFrameskip-v4" --test_num 100` | 3~4h (100 epoch) | | ||
| EnduroNoFrameskip-v4 | 670 |  | `python3 atari_dqn.py --task "EnduroNoFrameskip-v4 " --test_num 100` | 3~4h (100 epoch) | | ||
| QbertNoFrameskip-v4 | 7307 |  | `python3 atari_dqn.py --task "QbertNoFrameskip-v4" --test_num 100` | 3~4h (100 epoch) | | ||
| MsPacmanNoFrameskip-v4 | 2107 |  | `python3 atari_dqn.py --task "MsPacmanNoFrameskip-v4" --test_num 100` | 3~4h (100 epoch) | | ||
| SeaquestNoFrameskip-v4 | 2088 |  | `python3 atari_dqn.py --task "SeaquestNoFrameskip-v4" --test_num 100` | 3~4h (100 epoch) | | ||
| SpaceInvadersNoFrameskip-v4 | 812.2 |  | `python3 atari_dqn.py --task "SpaceInvadersNoFrameskip-v4" --test_num 100` | 3~4h (100 epoch) | | ||
| PongNoFrameskip-v4 | 20 |  | `python3 atari_dqn.py --task "PongNoFrameskip-v4" --batch_size 64` | ~30 min (~15 epoch) | | ||
| BreakoutNoFrameskip-v4 | 316 |  | `python3 atari_dqn.py --task "BreakoutNoFrameskip-v4" --test_num 100` | 3~4h (100 epoch) | | ||
| EnduroNoFrameskip-v4 | 670 |  | `python3 atari_dqn.py --task "EnduroNoFrameskip-v4 " --test_num 100` | 3~4h (100 epoch) | | ||
| QbertNoFrameskip-v4 | 7307 |  | `python3 atari_dqn.py --task "QbertNoFrameskip-v4" --test_num 100` | 3~4h (100 epoch) | | ||
| MsPacmanNoFrameskip-v4 | 2107 |  | `python3 atari_dqn.py --task "MsPacmanNoFrameskip-v4" --test_num 100` | 3~4h (100 epoch) | | ||
| SeaquestNoFrameskip-v4 | 2088 |  | `python3 atari_dqn.py --task "SeaquestNoFrameskip-v4" --test_num 100` | 3~4h (100 epoch) | | ||
| SpaceInvadersNoFrameskip-v4 | 812.2 |  | `python3 atari_dqn.py --task "SpaceInvadersNoFrameskip-v4" --test_num 100` | 3~4h (100 epoch) | | ||
|
||
Note: the eps_train_final and eps_test in the original DQN paper is 0.1 and 0.01, but [some works](https://github.com/google/dopamine/tree/master/baselines) found that smaller eps helps improve the performance. Also, a large batchsize (say 64 instead of 32) will help but will slow down the speed. | ||
Note: The eps_train_final and eps_test in the original DQN paper is 0.1 and 0.01, but [some works](https://github.com/google/dopamine/tree/master/baselines) found that smaller eps helps improve the performance. Also, a large batchsize (say 64 instead of 32) will help faster convergence but will slow down the training speed. | ||
|
||
Have fun with playing these hyperparameters! | ||
We haven't tuned this result to the best, so have fun with playing these hyperparameters! |