Performance Update #8

jcoreyes · 2016-04-29T20:33:24Z

Keep replay memory (screens, pre and post states) in gpu memory
Use transpose kernel to switch to chwn format
Training steps per second on breakout rom at 455 up from 260

- Keep replay memory (screens, pre and post states) in gpu memory - Use transpose kernel to switch to chwn format - Training steps per second on breakout rom at 455 up from 260

tambetm · 2016-04-30T05:42:41Z

Thanks for a nice pull request, together those changes result in almost 2x improvement!

But I would like to keep the code runnable on lesser GPUs as well, therefore I would like to have two ReplayMemory implementations that you can choose using command line switch. Also I would like to keep main code independent of Neon, therefore we need to figure out how to share backend between ReplayMemory and DeepQNetwork without instantiating it in main. Or can we just use two separate backends?

Also I understood, that current version is achieving 38% GPU utilization on Titan X. I wonder what could be done to achieve 100%? Some ideas:

implement Q-updates in DeepQNetwork also using Neon backend,
run playing and training parallel in separate threads.

…emory

mw66 · 2016-07-17T15:28:58Z

Does this fork really keep replay memory in GPU?

I tried the latest version, but my GPU usage:

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2963 C python 112MiB |
+-----------------------------------------------------------------------------+

And main memory:

PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20 0 44.118g 6.766g 110244 R 100.0 43.2 534:05.11 python

6.766g is about the size that 1M replay memory in main memory.

Performance Update

8129b18

- Keep replay memory (screens, pre and post states) in gpu memory - Use transpose kernel to switch to chwn format - Training steps per second on breakout rom at 455 up from 260

jcoreyes added 4 commits June 8, 2016 10:29

Seperate file for initializing backend from deepqnetwork and replay m…

de4e98a

…emory

Merge with Tambet simple_dqn

122947b

Reset to clean Tambet dqn

eee08cc

Temp test gpu data

9b54080

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Update #8

Performance Update #8

jcoreyes commented Apr 29, 2016

tambetm commented Apr 30, 2016

mw66 commented Jul 17, 2016

Performance Update #8

Are you sure you want to change the base?

Performance Update #8

Conversation

jcoreyes commented Apr 29, 2016

tambetm commented Apr 30, 2016

mw66 commented Jul 17, 2016