Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reducing memory usage during inference #498

Closed
venkai opened this issue Apr 12, 2018 · 2 comments
Closed

Reducing memory usage during inference #498

venkai opened this issue Apr 12, 2018 · 2 comments

Comments

@venkai
Copy link

venkai commented Apr 12, 2018

Hi,

I am working with a very deep fully convolutional architecture that currently takes up ~12G memory for a single 600 x 800 image during inference. To reduce memory, I modified net.cpp to allow duplicate top blobs during inference (gist here), and rewrote the prototxt for inference to use minimal unique activations. While this works and lowers memory usage (the net can now handle a 1280 x 720 image), the reduction in memory usage however is far less than I anticipated.

For instance, consider a simple feed-forward network without any branches: A1->A2->A3->A4->....->An
Ex: Conv->BN->ReLU->Conv->BN->ReLU->...
If all activations from A1 to An are of the same size, then we should be able to do inference by storing only 2 activations in memory and juggling computation between the two, like X->Y->X->Y->X->...
In practice however, the memory taken by this network is much much greater than that of 2 activations.

I suspect this is because of unique internal buffers used in each layer and/or a seperate workspace being used for each cudnn-convolution layer. In vanilla caffe, there was a trick to make the internal buffers static (here). This however doesn't work with cudnn. Is there something similar we can do here with internal buffers ? Also, is it possible to use a global workspace for all convolution layers, like in MxNet ?

Thnx

@drnikolaev
Copy link

Hi @venkai , wait a minute. We do use global space for all CuDNN Convolution layers:
https://github.com/NVIDIA/caffe/blob/caffe-0.17/include/caffe/util/gpu_memory.hpp#L176-L181
It's a vector of N global spaces where N is the number of GPUs used.
And thanks for the gist, I'll explore it as soon as I can.

@drnikolaev
Copy link

cuDNN flow is clean, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants