You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that, for the most part, caffe does not make use of cuda streams in the gpu implementation. This means that all of the operations are synchronized on the default cuda stream.
This is a reasonable implementation if we're not worried about concurrency. But if we wanted to share the gpu among host threads, it's not optimal. Ex: If we try to run inference on two different caffe models from two different host threads, they will constantly block each other on the default cuda stream, even though they are completely independent.
Is this a known and accepted limitation of caffe? Or is there any plan to move computation into cuda streams to avoid blocking the entire gpu?
Thanks for your thoughts!
The text was updated successfully, but these errors were encountered:
It seems that, for the most part, caffe does not make use of cuda streams in the gpu implementation. This means that all of the operations are synchronized on the default cuda stream.
This is a reasonable implementation if we're not worried about concurrency. But if we wanted to share the gpu among host threads, it's not optimal. Ex: If we try to run inference on two different caffe models from two different host threads, they will constantly block each other on the default cuda stream, even though they are completely independent.
Is this a known and accepted limitation of caffe? Or is there any plan to move computation into cuda streams to avoid blocking the entire gpu?
Thanks for your thoughts!
The text was updated successfully, but these errors were encountered: