Torchvision transform hangs when using python multiprocessing and model inference #4529

theahura · 2021-10-03T16:07:05Z

🐛 Describe the bug

Torchvision transforms cause code to hang when using python multiprocessing and a model on inference.

In particular, I'm seeing hangs when using a CLIP model and entirely unrelated torch code running in a multiprocess. An issue was filed against the CLIP repository here (openai/CLIP#130) but I figured this should also be flagged on torchvision because I don't think this issue has to do with their model in particular. The model hangs on img.permute((2, 0, 1)).contiguous() on this line in transforms/functional.py. This is in turn called by the ToTensor transform at F.to_tensor(pic) on this line.

Minimal code sample in which I split the to_tensor transform and remove any unneeded parts:

import torch
import clip
from PIL import Image
import multiprocessing as mp

model = clip.model.CLIP(512, 224, 12, 768, 32, 77, 49408, 512, 8, 12)


def test():
  print("GETTING IMAGE")
  im = Image.open("CLIP.png")
  print("CONVERTING")
  im = im.convert('RGB')
  print("MADE TENSOR")
  img = torch.ByteTensor(torch.ByteStorage.from_buffer(im.tobytes()))
  print("VIEW")
  img = img.view(im.size[1], im.size[0], len(im.getbands()))
  print("PERMUTING")
  img = img.permute((2, 0, 1))
  print("CONTINGUOUS")
  img = img.contiguous()
  print("DIV")
  img = img.float().div(255)
  print("UNSQUEEZE")
  img = img.unsqueeze(0)
  return img


p = mp.Process(target=test, daemon=True)
p.start()
p.join()

This code will hang on the img.contiguous() call, but only if the model is initialized at the top. If the model at the top is commented out, this works as expected. Further, note that the multiprocess function does not even use the model.

Versions

Collecting environment information...
/home/amol/code/soot/debugging/clip_tests/env/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
PyTorch version: 1.7.1
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.8.10 (default, Jun  2 2021, 10:49:15)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.11.0-34-generic-x86_64-with-glibc2.29
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.0
[pip3] torch==1.7.1
[pip3] torchvision==0.8.2
[conda] Could not collect

cc @vfdev-5 @datumbox

The text was updated successfully, but these errors were encountered:

rsamf · 2024-03-16T01:45:10Z

I'm dealing with the same problem 2+ years later

ed-cho · 2024-05-22T05:13:14Z

same here... any updates on this?

Adeniyilowee · 2024-05-22T05:41:22Z

I am currently going through the same problem after building a new environment.. torch.tensor() seems to be the culprit.. everything also seems to work perfectly on old environment. Any update? Can anyone help?

theahura mentioned this issue Oct 3, 2021

make the tensor continuous when passing numpy object to tensor #2483

Merged

vfdev-5 added the module: transforms label Oct 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torchvision transform hangs when using python multiprocessing and model inference #4529

Torchvision transform hangs when using python multiprocessing and model inference #4529

theahura commented Oct 3, 2021 •

edited by pytorch-probot bot

Loading

rsamf commented Mar 16, 2024

ed-cho commented May 22, 2024

Adeniyilowee commented May 22, 2024 •

edited

Loading

Torchvision transform hangs when using python multiprocessing and model inference #4529

Torchvision transform hangs when using python multiprocessing and model inference #4529

Comments

theahura commented Oct 3, 2021 • edited by pytorch-probot bot Loading

🐛 Describe the bug

Versions

rsamf commented Mar 16, 2024

ed-cho commented May 22, 2024

Adeniyilowee commented May 22, 2024 • edited Loading

theahura commented Oct 3, 2021 •

edited by pytorch-probot bot

Loading

Adeniyilowee commented May 22, 2024 •

edited

Loading