You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working with a very deep fully convolutional architecture that currently takes up ~12G memory for a single 600 x 800 image during inference. To reduce memory, I modified net.cpp to allow duplicate top blobs during inference (gist here), and rewrote the prototxt for inference to use minimal unique activations. While this works and lowers memory usage (the net can now handle a 1280 x 720 image), the reduction in memory usage however is far less than I anticipated.
For instance, consider a simple feed-forward network without any branches: A1->A2->A3->A4->....->An
Ex: Conv->BN->ReLU->Conv->BN->ReLU->...
If all activations from A1 to An are of the same size, then we should be able to do inference by storing only 2 activations in memory and juggling computation between the two, like X->Y->X->Y->X->...
In practice however, the memory taken by this network is much much greater than that of 2 activations.
I suspect this is because of unique internal buffers used in each layer and/or a seperate workspace being used for each cudnn-convolution layer. In vanilla caffe, there was a trick to make the internal buffers static (here). This however doesn't work with cudnn. Is there something similar we can do here with internal buffers ? Also, is it possible to use a global workspace for all convolution layers, like in MxNet ?
Thnx
The text was updated successfully, but these errors were encountered:
Hi,
I am working with a very deep fully convolutional architecture that currently takes up ~12G memory for a single 600 x 800 image during inference. To reduce memory, I modified net.cpp to allow duplicate top blobs during inference (gist here), and rewrote the prototxt for inference to use minimal unique activations. While this works and lowers memory usage (the net can now handle a 1280 x 720 image), the reduction in memory usage however is far less than I anticipated.
For instance, consider a simple feed-forward network without any branches: A1->A2->A3->A4->....->An
Ex:
Conv->BN->ReLU->Conv->BN->ReLU->...
If all activations from A1 to An are of the same size, then we should be able to do inference by storing only 2 activations in memory and juggling computation between the two, like X->Y->X->Y->X->...
In practice however, the memory taken by this network is much much greater than that of 2 activations.
I suspect this is because of unique internal buffers used in each layer and/or a seperate workspace being used for each cudnn-convolution layer. In vanilla caffe, there was a trick to make the internal buffers static (here). This however doesn't work with cudnn. Is there something similar we can do here with internal buffers ? Also, is it possible to use a global workspace for all convolution layers, like in MxNet ?
Thnx
The text was updated successfully, but these errors were encountered: