Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure of TensorRT 10.7 to eliminate concatenation with upstream custom layer #4345

Open
jchia opened this issue Feb 4, 2025 · 1 comment
Assignees
Labels
Documentation Lack of clarity in documentation Enhancement New feature or request triaged Issue has been triaged by maintainers

Comments

@jchia
Copy link

jchia commented Feb 4, 2025

Description

It seems that TensorRT cannot eliminate a concatenation layer if there is an upstream custom layer.

In a simple model that uses all standard operators, TensorRT engine building eliminates concatenation, but after I replaced Add with a CustomAdd that does the same thing as Add, TensorRT engine building does not eliminate the concatenation.

This failure to eliminate concatenation diminishes the benefit of using plugins when a plugin outputs to a concatenation layer, especially in terms of reducing the number of kernels, since failing to eliminate concatenation typically results in kernels called copyVectorizedKernel being used to do the copying.

From the engine-building log, it appears that the failure is related to a concept called "striding support", but I could not find any documentation on it especially in relation to plugins.

My goal is for the concatenation to also be eliminated in the case involving custom layers, so that there are no unnecessary copyVectorizedKernel kernels. If the current behavior is by design, there should be documentation about this caveat regarding the use of plugins.

Environment

TensorRT Version: 10.7

NVIDIA GPU: RTX 3080

NVIDIA Driver Version: 565.57.01

CUDA Version: 12.7

CUDNN Version: N/A

Operating System: Ubuntu 24.04

Python Version (if applicable): 3.12 (but irrelevant)

Tensorflow Version (if applicable): N/A

PyTorch Version (if applicable): N/A

Baremetal or Container (if so, version): baremetal

Relevant Files

https://github.com/jchia/trt-copy contains all the details to repro a situation illustrating the problem.

Steps To Reproduce

With the content of the repo at https://github.com/jchia/trt-copy, refer to https://github.com/jchia/trt-copy/blob/master/README.md.

The steps are:

$ make plugin.so
$ trtexec --verbose --onnx=sac16.onnx --saveEngine=sac16.plan
$ trtexec --verbose --onnx=sac16c.onnx --saveEngine=sac16c.plan --dynamicPlugins=./plugin.so
$ /opt/nvidia/nsight-compute/2024.3.2/ncu --target-processes all /usr/src/tensorrt/bin/trtexec --loadEngine=sac16.plan
$ /opt/nvidia/nsight-compute/2024.3.2/ncu --target-processes all /usr/src/tensorrt/bin/trtexec --loadEngine=sac16c.plan --dynamicPlugins=./plugin.so

The output of the engine-building steps indicates that concatenation is eliminated when Add is used but not when CustomAdd is used. Details are explained in the README.md.

In particular, for the model with Add (sac16.onnx), there are these lines:

Eliminating concatenation node_of_output
Retargeting part0_plus1 to output

But for the model with CustomAdd (sac16c.onnx), there are these lines:

Eliminating concatenation node_of_output
Generating copy for part0_plus1 to output because input does not support striding.

Commands or scripts:

Have you tried the latest release?: No

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): Haven't tried, but it runs on TensorRT, suboptimally.

@kevinch-nv kevinch-nv added Enhancement New feature or request Documentation Lack of clarity in documentation triaged Issue has been triaged by maintainers labels Feb 5, 2025
@kevinch-nv kevinch-nv self-assigned this Feb 5, 2025
@kevinch-nv
Copy link
Collaborator

You are correct, currently the concatenation elimination pass is unsupported for plugin nodes. I'll see to updating the TensorRT developer guide about this.

Do you have motivating use case where the time to copy is dominating over the time saved over using a custom plugin?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Lack of clarity in documentation Enhancement New feature or request triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants