Replies: 2 comments 3 replies
-
Hi @aklacar1 First you need to decide what kind of output do you want you program to generate )) If that's the multitude of JPEG files then I recommend you to find a library or framework which supports nvJPEG (VPF doesn't). Otherwise, if you're open to different output formats you can convert your torch tensors back to VPF Surfaces and encode them as it's shown in https://github.com/NVIDIA/VideoProcessingFramework/blob/master/samples/SamplePyTorch.py. Nvenc can output videos of decent quality comparable to JPEG or even lossless video. Lossless is slower then usual lossy H.265 but it will be faster then Device -> Host -> JPEG anyway. Also don't forget about pixel formats. One more advice is to use multiple CUDA streams. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the clarification, now it makes much more sense. Video codecs + alpha channel usually aren't a great match. What's your target output format? I'm basically searching for an approach that keeps everything on GPU. |
Beta Was this translation helpful? Give feedback.
-
I am using VPF in order to speed up RobustVideoMatting, and currently it is working perfectly. However, I have found that I have a bottleneck and that is writing tensor to image files.
Use case:
I have loop of PyTorch CUDA Tensors from Video, so 900 tensors for video of 30 seconds. Basic RVM calculation is pretty fast, however I ran into problem when writing this to files, since I need to generate 900 Images from this.
I would appreciate some recommendation here on best practices in Python with Pytorch and VPF.
What I am currently doing:
Each iteration I am moving from GPU to CPU tensors and converting them to numpy (uint8)
After all iterations are done, I am writing them to image files using cv2 with ThreadPoolExecutor.
My ideas are to monitor memory consumption better and try to parallelize this further with ThreadPoolExecutor, but not sure will that speed things up by much or at all. Each tensor is about 32MiB so it fills up GPU quite fast if I do not move them to CPU numpy.
I was looking at Encoder of VPF, but not sure would that help me at all or how to use it ?
Processing speed currently around 12.85 frames per second. (If i do it all on CPU it would be 2-3 Frames per second)
Beta Was this translation helpful? Give feedback.
All reactions