Failure of TensorRT 10.7 to eliminate concatenation with upstream custom layer #4345

jchia · 2025-02-04T14:08:00Z

Description

It seems that TensorRT cannot eliminate a concatenation layer if there is an upstream custom layer.

In a simple model that uses all standard operators, TensorRT engine building eliminates concatenation, but after I replaced Add with a CustomAdd that does the same thing as Add, TensorRT engine building does not eliminate the concatenation.

This failure to eliminate concatenation diminishes the benefit of using plugins when a plugin outputs to a concatenation layer, especially in terms of reducing the number of kernels, since failing to eliminate concatenation typically results in kernels called copyVectorizedKernel being used to do the copying.

From the engine-building log, it appears that the failure is related to a concept called "striding support", but I could not find any documentation on it especially in relation to plugins.

My goal is for the concatenation to also be eliminated in the case involving custom layers, so that there are no unnecessary copyVectorizedKernel kernels. If the current behavior is by design, there should be documentation about this caveat regarding the use of plugins.

Environment

TensorRT Version: 10.7

NVIDIA GPU: RTX 3080

NVIDIA Driver Version: 565.57.01

CUDA Version: 12.7

CUDNN Version: N/A

Operating System: Ubuntu 24.04

Python Version (if applicable): 3.12 (but irrelevant)

Tensorflow Version (if applicable): N/A

PyTorch Version (if applicable): N/A

Baremetal or Container (if so, version): baremetal

Relevant Files

https://github.com/jchia/trt-copy contains all the details to repro a situation illustrating the problem.

Steps To Reproduce

With the content of the repo at https://github.com/jchia/trt-copy, refer to https://github.com/jchia/trt-copy/blob/master/README.md.

The steps are:

$ make plugin.so
$ trtexec --verbose --onnx=sac16.onnx --saveEngine=sac16.plan
$ trtexec --verbose --onnx=sac16c.onnx --saveEngine=sac16c.plan --dynamicPlugins=./plugin.so
$ /opt/nvidia/nsight-compute/2024.3.2/ncu --target-processes all /usr/src/tensorrt/bin/trtexec --loadEngine=sac16.plan
$ /opt/nvidia/nsight-compute/2024.3.2/ncu --target-processes all /usr/src/tensorrt/bin/trtexec --loadEngine=sac16c.plan --dynamicPlugins=./plugin.so

The output of the engine-building steps indicates that concatenation is eliminated when Add is used but not when CustomAdd is used. Details are explained in the README.md.

In particular, for the model with Add (sac16.onnx), there are these lines:

Eliminating concatenation node_of_output
Retargeting part0_plus1 to output

But for the model with CustomAdd (sac16c.onnx), there are these lines:

Eliminating concatenation node_of_output
Generating copy for part0_plus1 to output because input does not support striding.

Commands or scripts:

Have you tried the latest release?: No

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): Haven't tried, but it runs on TensorRT, suboptimally.

The text was updated successfully, but these errors were encountered:

kevinch-nv · 2025-02-05T21:21:35Z

You are correct, currently the concatenation elimination pass is unsupported for plugin nodes. I'll see to updating the TensorRT developer guide about this.

Do you have motivating use case where the time to copy is dominating over the time saved over using a custom plugin?

kevinch-nv added Enhancement New feature or request Documentation Lack of clarity in documentation triaged Issue has been triaged by maintainers labels Feb 5, 2025

kevinch-nv self-assigned this Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure of TensorRT 10.7 to eliminate concatenation with upstream custom layer #4345

Failure of TensorRT 10.7 to eliminate concatenation with upstream custom layer #4345

jchia commented Feb 4, 2025 •

edited

Loading

kevinch-nv commented Feb 5, 2025

Failure of TensorRT 10.7 to eliminate concatenation with upstream custom layer #4345

Failure of TensorRT 10.7 to eliminate concatenation with upstream custom layer #4345

Comments

jchia commented Feb 4, 2025 • edited Loading

Description

Environment

Relevant Files

Steps To Reproduce

kevinch-nv commented Feb 5, 2025

jchia commented Feb 4, 2025 •

edited

Loading