GroupNormalization plugin failure of TensorRT 10.0.1.6 when running trtexec on GPU A4000 #3950

appearancefnp · 2024-06-18T07:46:16Z

Description

Hey guys!
I wanted to upgrade from TensorRT 8.6 to 10.0. I have a ONNX model that contains GroupNormalization plugin. It creates a serialized version, but it fails when deserializing the model while trying to load cudnn 8 instead of cudnn 9.

Environment

Using docker: nvcr.io/nvidia/tensorrt:24.05-py3

TensorRT Version: 10.0.1

NVIDIA GPU: A4000

NVIDIA Driver Version: 550.67

CUDA Version: 12.4

CUDNN Version: 9.1 (per container documentation)

Operating System:

Python Version (if applicable): -

Tensorflow Version (if applicable): -

PyTorch Version (if applicable): -

Baremetal or Container (if so, version): nvcr.io/nvidia/tensorrt:24.05-py3

Relevant Files

Model link: https://drive.google.com/file/d/1vmGZpWJ_1sfz2ejbZoO3fFaR5udxOLTi/view?usp=sharing

Steps To Reproduce

Run trtexec: trtexec --onnx=model.onnx
trtexec builds the engine

...
[06/17/2024-14:57:28] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 3 MiB, GPU 1984 MiB
[06/17/2024-14:57:28] [I] [TRT] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 3059 MiB
[06/17/2024-14:57:28] [I] Engine built in 886.712 sec.
[06/17/2024-14:57:28] [I] Created engine with size: 55.3649 MiB
[06/17/2024-14:57:28] [I] [TRT] Loaded engine size: 55 MiB
[06/17/2024-14:57:28] [I] Engine deserialized in 0.0301295 sec.
[06/17/2024-14:57:28] [F] [TRT] Validation failed: Failed to load libcudnn.so.8.
plugin/common/cudnnWrapper.cpp:90

[06/17/2024-14:57:28] [E] [TRT] std::exception
[06/17/2024-14:57:28] [F] [TRT] Validation failed: Failed to load libcudnn.so.8.
plugin/common/cudnnWrapper.cpp:90

[06/17/2024-14:57:28] [E] [TRT] std::exception
[06/17/2024-14:57:28] [F] [TRT] Validation failed: Failed to load libcudnn.so.8.
plugin/common/cudnnWrapper.cpp:90

[06/17/2024-14:57:28] [E] [TRT] std::exception
[06/17/2024-14:57:28] [F] [TRT] Validation failed: Failed to load libcudnn.so.8.
plugin/common/cudnnWrapper.cpp:90

[06/17/2024-14:57:28] [E] [TRT] std::exception
[06/17/2024-14:57:28] [F] [TRT] Validation failed: Failed to load libcudnn.so.8.
plugin/common/cudnnWrapper.cpp:90

...
[06/17/2024-14:57:28] [E] [TRT] std::exception
[06/17/2024-14:57:28] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +156, now: CPU 1, GPU 199 (MiB)
[06/17/2024-14:57:28] [I] Setting persistentCacheLimit to 0 bytes.
[06/17/2024-14:57:28] [I] Created execution context with device memory size: 155.537 MiB
[06/17/2024-14:57:28] [I] Using random values for input images
[06/17/2024-14:57:28] [I] Input binding for images with dimensions 1x500x1000x3 is created.
[06/17/2024-14:57:28] [I] Output binding for class_heatmaps with dimensions 1x5x125x250 is created.
[06/17/2024-14:57:28] [I] Starting inference
[06/17/2024-14:57:28] [F] [TRT] Validation failed: mBnScales != nullptr && mBnScales->mPtr != nullptr
plugin/groupNormalizationPlugin/groupNormalizationPlugin.cpp:132

[06/17/2024-14:57:28] [E] [TRT] std::exception
[06/17/2024-14:57:28] [E] Error[2]: [pluginV2DynamicExtRunner.cpp::execute::115] Error Code 2: Internal Error (Assertion pluginUtils::isSuccess(status) failed. )
[06/17/2024-14:57:28] [E] Error occurred during inference

Commands or scripts:
trtexec --onnx=model.onnx

Have you tried the latest release?: yes

The text was updated successfully, but these errors were encountered:

lix19937 · 2024-06-21T03:44:02Z

Can you upload full log with trtexec --onnx=model.onnx --verbose ?

appearancefnp · 2024-06-21T12:05:04Z

@lix19937
trtexec.log

lix19937 · 2024-06-22T04:06:36Z

[06/17/2024-14:57:28] [E] [TRT] std::exception
[06/17/2024-14:57:28] [F] [TRT] Validation failed: Failed to load libcudnn.so.8.
plugin/common/cudnnWrapper.cpp:90

Make sure libcudnn.so load successed. Add path to LD_LIBRARY_PATH.

appearancefnp · 2024-07-01T09:17:56Z

[06/17/2024-14:57:28] [E] [TRT] std::exception
[06/17/2024-14:57:28] [F] [TRT] Validation failed: Failed to load libcudnn.so.8.
plugin/common/cudnnWrapper.cpp:90

Make sure libcudnn.so load successed. Add path to LD_LIBRARY_PATH.

The problem is that the NVIDIA container contains cudnn 9.1.0, but the plugin is trying to load libcudnn.so.8. There is a version mismatch, not that cudnn is not available.

lix19937 · 2024-07-01T10:01:22Z

You should make sure your env has one cudnn, and why your nvinfer plugin will load cudnn.8.0 ?

appearancefnp · 2024-07-01T10:06:59Z

This is not my plugin - this is the plugin provided in this repo - https://github.com/NVIDIA/TensorRT/tree/release/10.1/plugin/groupNormalizationPlugin

And it loads cudnn 8, not 9 because it has the wrong macro defined here: https://github.com/NVIDIA/TensorRT/blob/release/10.1/plugin/common/cudnnWrapper.cpp#L26

lix19937 · 2024-07-01T10:29:20Z

From https://github.com/NVIDIA/TensorRT/tree/release/10.0, trt version 10.0.1.6, cudnn recommend follow

TensorRT GA build

TensorRT v10.0.1.6
Available from direct download links listed below
System Packages

CUDA
Recommended versions:
cuda-12.2.0 + cuDNN-8.9
cuda-11.8.0 + cuDNN-8.9
GNU make >= v4.1
cmake >= v3.13
python >= v3.8, <= v3.10.x
pip >= v19.0
Essential utilities
git, pkg-config, wget

map to https://github.com/NVIDIA/TensorRT/blob/release/10.1/plugin/common/cudnnWrapper.cpp#L26-L42

You can try to creat a soft link ln -s libcudnn.so.9 libcudnn.so.8.

appearancefnp · 2024-07-03T07:27:57Z

Why does the container include cudnn 9 then?
https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/index.html#rel-24-06

If TensorRT doesn't work in an NVIDIA container with cudnn 9, why does it ship with it?

ttyio · 2024-08-07T05:41:43Z

@appearancefnp , we now use native groupnorm support in the onnx parser, see https://github.com/onnx/onnx-tensorrt/blob/f161f95883b4ebd8cb789de5efc67b73c0a6e694/onnxOpImporters.cpp#L2151

could you replace the groupnormplugin with groupnorm in your model? thanks!

moraxu · 2024-09-07T01:16:13Z

@appearancefnp , I will be closing this ticket due to our policy to close tickets with no activity for more than 21 days after a reply had been posted. Please reopen a new ticket if you still need help.

toothache · 2025-01-24T05:36:32Z

@appearancefnp , we now use native groupnorm support in the onnx parser, see https://github.com/onnx/onnx-tensorrt/blob/f161f95883b4ebd8cb789de5efc67b73c0a6e694/onnxOpImporters.cpp#L2151

could you replace the groupnormplugin with groupnorm in your model? thanks!

I was able to run the native groupnorm in opset 18, but I encountered an issue when running the groupnorm in latest op version.
See #4336

ttyio added Module:Plugins Issues when using TensorRT plugins triaged Issue has been triaged by maintainers labels Aug 7, 2024

moraxu closed this as completed Sep 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GroupNormalization plugin failure of TensorRT 10.0.1.6 when running trtexec on GPU A4000 #3950

GroupNormalization plugin failure of TensorRT 10.0.1.6 when running trtexec on GPU A4000 #3950

appearancefnp commented Jun 18, 2024

lix19937 commented Jun 21, 2024

appearancefnp commented Jun 21, 2024

lix19937 commented Jun 22, 2024

appearancefnp commented Jul 1, 2024

lix19937 commented Jul 1, 2024

appearancefnp commented Jul 1, 2024

lix19937 commented Jul 1, 2024

TensorRT GA build

appearancefnp commented Jul 3, 2024

ttyio commented Aug 7, 2024

moraxu commented Sep 7, 2024

toothache commented Jan 24, 2025

GroupNormalization plugin failure of TensorRT 10.0.1.6 when running trtexec on GPU A4000 #3950

GroupNormalization plugin failure of TensorRT 10.0.1.6 when running trtexec on GPU A4000 #3950

Comments

appearancefnp commented Jun 18, 2024

Description

Environment

Relevant Files

Steps To Reproduce

lix19937 commented Jun 21, 2024

appearancefnp commented Jun 21, 2024

lix19937 commented Jun 22, 2024

appearancefnp commented Jul 1, 2024

lix19937 commented Jul 1, 2024

appearancefnp commented Jul 1, 2024

lix19937 commented Jul 1, 2024

TensorRT GA build

appearancefnp commented Jul 3, 2024

ttyio commented Aug 7, 2024

moraxu commented Sep 7, 2024

toothache commented Jan 24, 2025