You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Torchvision transforms cause code to hang when using python multiprocessing and a model on inference.
In particular, I'm seeing hangs when using a CLIP model and entirely unrelated torch code running in a multiprocess. An issue was filed against the CLIP repository here (openai/CLIP#130) but I figured this should also be flagged on torchvision because I don't think this issue has to do with their model in particular. The model hangs on img.permute((2, 0, 1)).contiguous()on this line in transforms/functional.py. This is in turn called by the ToTensor transform at F.to_tensor(pic)on this line.
Minimal code sample in which I split the to_tensor transform and remove any unneeded parts:
This code will hang on the img.contiguous() call, but only if the model is initialized at the top. If the model at the top is commented out, this works as expected. Further, note that the multiprocess function does not even use the model.
Versions
Collecting environment information...
/home/amol/code/soot/debugging/clip_tests/env/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
return torch._C._cuda_getDeviceCount() > 0
PyTorch version: 1.7.1
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31
Python version: 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.11.0-34-generic-x86_64-with-glibc2.29
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.21.0
[pip3] torch==1.7.1
[pip3] torchvision==0.8.2
[conda] Could not collect
I am currently going through the same problem after building a new environment.. torch.tensor() seems to be the culprit.. everything also seems to work perfectly on old environment. Any update? Can anyone help?
🐛 Describe the bug
Torchvision transforms cause code to hang when using python multiprocessing and a model on inference.
In particular, I'm seeing hangs when using a CLIP model and entirely unrelated torch code running in a multiprocess. An issue was filed against the CLIP repository here (openai/CLIP#130) but I figured this should also be flagged on torchvision because I don't think this issue has to do with their model in particular. The model hangs on
img.permute((2, 0, 1)).contiguous()
on this line in transforms/functional.py. This is in turn called by the ToTensor transform atF.to_tensor(pic)
on this line.Minimal code sample in which I split the to_tensor transform and remove any unneeded parts:
This code will hang on the
img.contiguous()
call, but only if the model is initialized at the top. If the model at the top is commented out, this works as expected. Further, note that the multiprocess function does not even use the model.Versions
cc @vfdev-5 @datumbox
The text was updated successfully, but these errors were encountered: