How to properly decode multiple frames and convert them to PyTorch tensor #551

darkAlert · 2023-11-20T17:43:06Z

My config:

Video Processing Framework built from sources according to instructions
Ubuntu 22.04
Python 3.10.12
PyTorch 2.1.0.dev20230706+cu121
CUDA 12.3
NVIDIA Driver 545.23.08
FFMPEG version N-112697-gf084e9b0be (built from sources)

I'm using the following class to decode video frames and convert them to PyTorch tensors:

class VideoDataset(Dataset):
    def __init__(self, video_path, gpu_id=0):
        self.video_path = video_path
        self.gpu_id = gpu_id
        self.decoder = None
        self.nv12_to_yuv = None
        self.yuv420_to_rgb = None
        self.rgb_to_pln = None
        self.cc_ctx = None
        self.width = None
        self.height = None

    def open(self):
        # Init HW decoder, convertor, resizer and color space context:
        self.decoder = nvc.PyNvDecoder(self.video_path, self.gpu_id)
        self.width, self.height = self.decoder.Width(), self.decoder.Height()
        self.nv12_to_yuv = nvc.PySurfaceConverter(
            self.width, self.height, nvc.PixelFormat.NV12, nvc.PixelFormat.YUV420, self.gpu_id
        )
        self.yuv420_to_rgb = nvc.PySurfaceConverter(
            self.width, self.height, nvc.PixelFormat.YUV420, nvc.PixelFormat.RGB, self.gpu_id
        )
        self.rgb_to_pln = nvc.PySurfaceConverter(
            self.width, self.height, nvc.PixelFormat.RGB, nvc.PixelFormat.RGB_PLANAR, self.gpu_id
        )
        self.cc_ctx = nvc.ColorspaceConversionContext(
            nvc.ColorSpace.BT_601, nvc.ColorRange.MPEG
        )

        return self

    def __len__(self):
        return self.decoder.Numframes()

    def __getitem__(self, idx):
        if idx >= len(self):
            raise StopIteration

        # Open video if it is not opened yet:
        if self.decoder is None:
            self.open()

        # Decode 1 compressed video frame to CUDA memory:
        nv12_surface = self.decoder.DecodeSingleSurface()
        if nv12_surface.Empty():
            print("Can not decode frame")
            return None

        # Convert from NV12 to YUV420
        # This extra step is required because not all NV12 -> RGB conversions
        # implemented in NPP support all color spaces and ranges:
        yuv420 = self.nv12_to_yuv.Execute(nv12_surface, self.cc_ctx)
        if yuv420.Empty():
            print("Can not convert nv12 -> yuv420")
            return None

        # Convert from YUV420 to interleaved RGB:
        rgb24 = self.yuv420_to_rgb.Execute(yuv420, self.cc_ctx)
        if rgb24.Empty():
            print("Can not convert yuv420 -> rgb")
            return None

        # Convert from RGB to planar RGB:
        rgb24_planar = self.rgb_to_pln.Execute(rgb24, self.cc_ctx)
        if rgb24_planar.Empty():
            print("Can not convert rgb -> rgb planar")
            return None

        if rgb24_planar.Format() != nvc.PixelFormat.RGB_PLANAR:
            raise RuntimeError("Surface shall be of RGB_PLANAR pixel format")

        surf_plane = rgb24_planar.PlanePtr()
        img_tensor = pnvc.DptrToTensor(
            surf_plane.GpuMem(),
            surf_plane.Width(),
            surf_plane.Height(),
            surf_plane.Pitch(),
            surf_plane.ElemSize(),
        )
        if img_tensor is None:
            raise RuntimeError("Can not export to tensor.")

        return img_tensor

Then I run a loop, collect two adjacent frames and merge them into a batch, that is passed to the model:

for idx, img_tensor in enumerate(dataset):
    batch.append(img_tensor)
    if len(batch) < 2:
        continue

    batch = torch.cat(batch, 0)
    batch = batch.float() / 255.0

    if torch.equal(batch[0], batch[1]):
        print('Equal', idx)

   preds = model(batch)

To check for correctness I use the torch.equal function. If I'm using the GPU, then torch.equal finds many identical frames (each run of the loop it happens on random frames). When I save such images, they are indeed the same and there are also various artifacts on them (it seems that the GPU memory is being overwritten by something).

I tried using torch.clone, but it didn't help.

But if I run it on the CPU it works fine. If I move the tensor from the GPU to the CPU before calling torch.cat, and then return it to the GPU before calling the model, then it also works correctly.

If I don't use batch mode (only 1 frame per iteration) it works fine again.

I don't use multithreading.

The text was updated successfully, but these errors were encountered:

darkAlert · 2023-11-20T18:22:28Z

I found the answer in this thread: #506 (comment)

rgb24_planar = rgb24_planar.Clone()

darkAlert closed this as completed Nov 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to properly decode multiple frames and convert them to PyTorch tensor #551

How to properly decode multiple frames and convert them to PyTorch tensor #551

darkAlert commented Nov 20, 2023 •

edited

Loading

darkAlert commented Nov 20, 2023

How to properly decode multiple frames and convert them to PyTorch tensor #551

How to properly decode multiple frames and convert them to PyTorch tensor #551

Comments

darkAlert commented Nov 20, 2023 • edited Loading

darkAlert commented Nov 20, 2023

darkAlert commented Nov 20, 2023 •

edited

Loading