-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Update #8
base: master
Are you sure you want to change the base?
Conversation
jcoreyes
commented
Apr 29, 2016
- Keep replay memory (screens, pre and post states) in gpu memory
- Use transpose kernel to switch to chwn format
- Training steps per second on breakout rom at 455 up from 260
- Keep replay memory (screens, pre and post states) in gpu memory - Use transpose kernel to switch to chwn format - Training steps per second on breakout rom at 455 up from 260
Thanks for a nice pull request, together those changes result in almost 2x improvement! But I would like to keep the code runnable on lesser GPUs as well, therefore I would like to have two ReplayMemory implementations that you can choose using command line switch. Also I would like to keep main code independent of Neon, therefore we need to figure out how to share backend between ReplayMemory and DeepQNetwork without instantiating it in main. Or can we just use two separate backends? Also I understood, that current version is achieving 38% GPU utilization on Titan X. I wonder what could be done to achieve 100%? Some ideas:
|
Does this fork really keep replay memory in GPU? I tried the latest version, but my GPU usage: $ nvidia-smi +-----------------------------------------------------------------------------+ And main memory: PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6.766g is about the size that 1M replay memory in main memory. |