Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Update #8

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Performance Update #8

wants to merge 5 commits into from

Conversation

jcoreyes
Copy link
Contributor

  • Keep replay memory (screens, pre and post states) in gpu memory
  • Use transpose kernel to switch to chwn format
  • Training steps per second on breakout rom at 455 up from 260

- Keep replay memory (screens, pre and post states) in gpu memory
- Use transpose kernel to switch to chwn format
- Training steps per second on breakout rom at 455 up from 260
@tambetm
Copy link
Owner

tambetm commented Apr 30, 2016

Thanks for a nice pull request, together those changes result in almost 2x improvement!

But I would like to keep the code runnable on lesser GPUs as well, therefore I would like to have two ReplayMemory implementations that you can choose using command line switch. Also I would like to keep main code independent of Neon, therefore we need to figure out how to share backend between ReplayMemory and DeepQNetwork without instantiating it in main. Or can we just use two separate backends?

Also I understood, that current version is achieving 38% GPU utilization on Titan X. I wonder what could be done to achieve 100%? Some ideas:

  • implement Q-updates in DeepQNetwork also using Neon backend,
  • run playing and training parallel in separate threads.

@mw66
Copy link

mw66 commented Jul 17, 2016

Does this fork really keep replay memory in GPU?

I tried the latest version, but my GPU usage:

$ nvidia-smi
Sun Jul 17 15:27:18 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.93 Driver Version: 352.93 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 960 Off | 0000:02:00.0 Off | N/A |
| 0% 51C P2 32W / 160W | 126MiB / 4095MiB | 28% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2963 C python 112MiB |
+-----------------------------------------------------------------------------+

And main memory:

PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20 0 44.118g 6.766g 110244 R 100.0 43.2 534:05.11 python

6.766g is about the size that 1M replay memory in main memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants