TensorRt result is different from onnx model in parseq model #3136

keivanmoazami · 2023-07-15T16:31:12Z

Description

I converted parseq ocr model from pytorch to onnx and tested it on onnx model and every thing is ok, but when I convert onnx to fp32 or fp16 tensorrt engine, output of the model is very different from onnx model.
I use onnsim to simplify onnx. if i dont use onnxsim all results are nan.

model repo : https://github.com/baudm/parseq

Environment

TensorRT Version: TensorRT-8.6.1.6

NVIDIA GPU: RTX 3060

NVIDIA Driver Version: 531.79

CUDA Version: cuda-12.0

CUDNN Version:cudnn-8.9.1.23_cuda12

Operating System: Win 10

Python Version: 3.8

PyTorch Version: 1.13

Onnx opset : 14

Relevant Files

onnx model: https://drive.google.com/file/d/1CRXsD8Zk5Mo50JYCZytrAtBbFm2oOqvc/view?usp=sharing

trtexec.exe --onnx=parseq/test.onnx --workspace=10000 --saveEngine=parseq/test_fp32.trs --verbose
trt engine fp32: https://drive.google.com/file/d/17eecl4QrRrE1BiLqDE8HJT0wZCVm3BkB/view?usp=sharing
trt engine fp32 log: https://drive.google.com/file/d/1i9KkbKainaNIz5QQvolmScIu53DzFHHv/view?usp=sharing

trtexec.exe --onnx=parseq/test.onnx --fp16 --workspace=10000 --saveEngine=parseq/test_fp16.trs --verbose
trt engine fp16: https://drive.google.com/file/d/1CIzRZ-71a2hXZWnMNtWn7k2tuM3Pi6K_/view?usp=sharing
trt engine fp16 log: https://drive.google.com/file/d/15LOBtarM6RZiiyZaz66qt6Z8nu67JyrN/view?usp=sharing

Steps To Reproduce

I wrote a sample code to compare similarity of onnx and trt inference result. when I use real data, mean of similarity is 0.3 and when I use random number it is near 0.85

sample code:
https://drive.google.com/file/d/1dLo9iD3ZUPVuvU6LNFnwQSCjcLDTiKQr/view?usp=sharing
sample real data:
https://drive.google.com/file/d/1VtQgOYw5ZYQSZmUOGyJ7xPKElC7caFMl/view?usp=sharing

zerollzeng · 2023-07-16T13:57:01Z

I can reproduce the issue with TRT 8.6.1, but looks like the issue has been fixed in our internal latest code(commit id 94d9acac4e3), please wait for the new release.

[I]         Error Metrics: output
[I]             Minimum Required Tolerance: elemwise error | [abs=9.8348e-07] OR [rel=8.7517e-06] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=2.5298e-08, std-dev=7.5773e-08, var=5.7415e-15, median=3.7835e-10, min=0 at (0, 0, 1), max=9.8348e-07 at (0, 5, 0), avg-magnitude=2.5298e-08
[I]             Relative Difference | Stats: mean=1.509e-06, std-dev=1.2058e-06, var=1.454e-12, median=1.2684e-06, min=0 at (0, 0, 1), max=8.7517e-06 at (0, 5, 32), avg-magnitude=1.509e-06
[I]         PASSED | Output: 'output' | Difference is within tolerance (rel=1e-05, abs=1e-05)
[I]     PASSED | All outputs matched | Outputs: ['output']
[I] Accuracy Summary | trt-runner-N0-07/16/23-13:49:45 vs. onnxrt-runner-N0-07/16/23-13:49:45 | Passed: 1/1 iterations | Pass Rate: 100.0%
[I] PASSED | Runtime: 97.785s | Command: /home/zeroz/.local/bin/polygraphy run test.onnx --trt --onnxrt

cc @nvpohanh who may know the fix info.

zerollzeng · 2023-07-16T13:57:47Z

8.6.1

[I]     Comparing Output: 'output' (dtype=float32, shape=(1, 9, 45)) with 'output' (dtype=float32, shape=(1, 9, 45))
[I]         Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I]         trt-runner-N0-07/16/23-13:55:51: output | Stats: mean=0.022222, std-dev=0.1051, var=0.011047, median=6.1031e-06, min=1.262e-12 at (0, 8, 39), max=1 at (0, 8, 0), avg-magnitude=0.022222
[I]             ---- Histogram ----
                Bin Range       |  Num Elems | Visualization
                (1.26e-12, 0.1) |        387 | ########################################
                (0.1     , 0.2) |          6 | 
                (0.2     , 0.3) |          3 | 
                (0.3     , 0.4) |          0 | 
                (0.4     , 0.5) |          1 | 
                (0.5     , 0.6) |          2 | 
                (0.6     , 0.7) |          2 | 
                (0.7     , 0.8) |          2 | 
                (0.8     , 0.9) |          1 | 
                (0.9     , 1  ) |          1 | 
[I]         onnxrt-runner-N0-07/16/23-13:55:51: output | Stats: mean=0.022222, std-dev=0.066196, var=0.004382, median=0.00044518, min=4.3574e-10 at (0, 8, 28), max=0.99999 at (0, 8, 0), avg-magnitude=0.022222
[I]             ---- Histogram ----
                Bin Range       |  Num Elems | Visualization
                (1.26e-12, 0.1) |        382 | ########################################
                (0.1     , 0.2) |         16 | #
                (0.2     , 0.3) |          4 | 
                (0.3     , 0.4) |          2 | 
                (0.4     , 0.5) |          0 | 
                (0.5     , 0.6) |          0 | 
                (0.6     , 0.7) |          0 | 
                (0.7     , 0.8) |          0 | 
                (0.8     , 0.9) |          0 | 
                (0.9     , 1  ) |          1 | 
[I]         Error Metrics: output
[I]             Minimum Required Tolerance: elemwise error | [abs=0.79589] OR [rel=84.81] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=0.028243, std-dev=0.087299, var=0.0076211, median=0.00038333, min=4.143e-10 at (0, 8, 30), max=0.79589 at (0, 0, 3), avg-magnitude=0.028243
[I]                 ---- Histogram ----
                    Bin Range          |  Num Elems | Visualization
                    (4.14e-10, 0.0796) |        369 | ########################################
                    (0.0796  , 0.159 ) |         20 | ##
                    (0.159   , 0.239 ) |          6 | 
                    (0.239   , 0.318 ) |          2 | 
                    (0.318   , 0.398 ) |          2 | 
                    (0.398   , 0.478 ) |          1 | 
                    (0.478   , 0.557 ) |          2 | 
                    (0.557   , 0.637 ) |          1 | 
                    (0.637   , 0.716 ) |          0 | 
                    (0.716   , 0.796 ) |          2 | 
[I]             Relative Difference | Stats: mean=1.4284, std-dev=4.9882, var=24.882, median=0.98306, min=8.9408e-06 at (0, 8, 0), max=84.81 at (0, 0, 3), avg-magnitude=1.4284
[I]                 ---- Histogram ----
                    Bin Range        |  Num Elems | Visualization
                    (8.94e-06, 8.48) |        401 | ########################################
                    (8.48    , 17  ) |          1 | 
                    (17      , 25.4) |          1 | 
                    (25.4    , 33.9) |          0 | 
                    (33.9    , 42.4) |          0 | 
                    (42.4    , 50.9) |          1 | 
                    (50.9    , 59.4) |          0 | 
                    (59.4    , 67.8) |          0 | 
                    (67.8    , 76.3) |          0 | 
                    (76.3    , 84.8) |          1 | 
[E]         FAILED | Output: 'output' | Difference exceeds tolerance (rel=1e-05, abs=1e-05)
[E]     FAILED | Mismatched outputs: ['output']
[E] Accuracy Summary | trt-runner-N0-07/16/23-13:55:51 vs. onnxrt-runner-N0-07/16/23-13:55:51 | Passed: 0/1 iterations | Pass Rate: 0.0%
[E] FAILED | Runtime: 27.728s | Command: /home/scratch.zeroz_sw/miniconda3/bin/polygraphy run test.onnx --trt --onnxrt

zerollzeng · 2023-07-16T13:58:46Z

I suspect this was caused by accumulated numerical error since I see there are a lot of transformer blocks.

keivanmoazami · 2023-07-16T14:25:08Z

I have same problem with VitStr based on timm vision transformer

VitStr:
https://github.com/roatienza/deep-text-recognition-benchmark/tree/master

Vision transformer:
https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/vision_transformer.py

It seems that the problem is related to vision transformer.

keivanmoazami · 2023-07-16T14:26:41Z

I can reproduce the issue with TRT 8.6.1, but looks like the issue has been fixed in our internal latest code(commit id 94d9acac4e3), please wait for the new release.

[I]         Error Metrics: output
[I]             Minimum Required Tolerance: elemwise error | [abs=9.8348e-07] OR [rel=8.7517e-06] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=2.5298e-08, std-dev=7.5773e-08, var=5.7415e-15, median=3.7835e-10, min=0 at (0, 0, 1), max=9.8348e-07 at (0, 5, 0), avg-magnitude=2.5298e-08
[I]             Relative Difference | Stats: mean=1.509e-06, std-dev=1.2058e-06, var=1.454e-12, median=1.2684e-06, min=0 at (0, 0, 1), max=8.7517e-06 at (0, 5, 32), avg-magnitude=1.509e-06
[I]         PASSED | Output: 'output' | Difference is within tolerance (rel=1e-05, abs=1e-05)
[I]     PASSED | All outputs matched | Outputs: ['output']
[I] Accuracy Summary | trt-runner-N0-07/16/23-13:49:45 vs. onnxrt-runner-N0-07/16/23-13:49:45 | Passed: 1/1 iterations | Pass Rate: 100.0%
[I] PASSED | Runtime: 97.785s | Command: /home/zeroz/.local/bin/polygraphy run test.onnx --trt --onnxrt

cc @nvpohanh who may know the fix info.

@nvpohanh
thanks for your response. when do you think the new version will be released ?

keivanmoazami · 2023-07-25T13:32:12Z

Can you give me an approximate time for the next release?
This is vital for us. Is it possible to access the unpublished version of the code?
Thanks.
@zerollzeng @nvpohanh

nvpohanh · 2023-07-26T02:20:23Z

With TRT 8.6, could you try exporting the ONNX model with opset17 or above so that the LayerNorms use the LayerNormalization ONNX operators rather than a bunch of ReduceMean ops and other pointwise ops?

I don't have an ETA for next TRT release yet.

keivanmoazami · 2023-07-26T07:57:56Z

The same problem remains With TRT 8.6 and opset 17.

onnx model with opset 17:
https://drive.google.com/file/d/1La7tGKEqAwaca_FaBYH91ibhhr52l0kW/view?usp=sharing

nvpohanh · 2023-07-26T08:21:56Z

Hmm if the LayerNorm is already an ONNX op, then the next possibility would be the MHA part. Could you experiment it by adding a Cast (to float32) before MHA and another Cast to FP16 after MHA and see if that helps the accuracy? If it does, that at least tells us something

keivanmoazami · 2023-07-26T10:37:25Z

Thanks for your advise. I add cast layers to 11 encoder blocks. Trt version of model work better but onnx checker not pass validation. I got cos similarity between torch model result and trt model and similarity was 0.9578
Is it necessary to add cast layers to decoder blocks too for achieve better result ?

modified onnx model :
https://drive.google.com/file/d/12UwIFv8LKL5GOyQA6jPzT8eCq39GYxpk/view?usp=sharing

nvpohanh · 2023-07-26T10:48:45Z

could you give it a try and see if it solves the e2e accuracy issue? I am just wondering if the MHA part is the issue, or if there are other issues.

I am trying this because in the next TRT version, we have some heuristics to force some MatMuls in MHA to run in FP32 and I wonder if that explains why @zerollzeng was able to get better accuracy with the internal version of TRT.

keivanmoazami · 2023-07-26T11:40:44Z

I run torch and trt model on test data set and the result is
unmodified onnx acc: 0.9971
unmodified torch acc: 0.9975
unmodified trt acc: 0.0156
modified trt acc: 0.9136

do you have any idea how can achieve better result ?

nvpohanh · 2023-07-27T01:38:42Z

I see. So adding Casts did recover the accuracy to some extent, but not fully.

Several more things to experiment with:

Add Casts around LayerNormalization ops and see if that helps.
Add Cast (to FP32) before the last Languange-Model Head MatMul (the very last MatMul in the network)

The more Casts added, the slower it gets, but the better accuracy it results in. So the task is to find out which layers are sensitive to FP16 precision the most and to run those layers in FP32

keivanmoazami · 2023-07-30T09:14:27Z

I have a mistake in preprocessing stage. I add cast layers to 11 encoder blocks. Trt version of model work exactly same as original torch model.
Thanks

ttyio · 2023-08-01T18:15:11Z

closing since it is solved, thanks all!

kino0924 · 2023-09-30T04:32:47Z

NGC 23.09 is still 8.6.1.6
Any plans for next release of TensorRT?

lizhao7-tal · 2023-10-18T10:43:33Z

I have a mistake in preprocessing stage. I add cast layers to 11 encoder blocks. Trt version of model work exactly same as original torch model. Thanks

Have you ever tried dynamic input? Can you support it?

PhilChina · 2023-10-18T15:13:54Z

cast layers? pytorch model or onnx model?

keivanmoazami · 2023-10-18T17:19:07Z

I use onnx-modifier
https://github.com/ZhangGe6/onnx-modifier

ozhanatwork · 2024-03-20T07:50:55Z

Can anyone share how to load .engine weights for inference

PhilCuriosity · 2024-05-14T06:49:35Z

Can anyone share how to load .engine weights for inference

https://github.com/fabio-sim/LightGlue-ONNX/blob/main/trt_infer.py

Sayyam-Jain · 2024-07-18T06:58:32Z

Hi, Is this issue fixed with latest TensorRT (10.x.x)?

lakshaypromact · 2024-12-30T11:13:54Z

Hi , is there any drive link which is working right now as all the above are not?

keivanmoazami changed the title ~~TensortRt result is different from onnx model in parseq model~~ TensorRt result is different from onnx model in parseq model Jul 15, 2023

keivanmoazami mentioned this issue Jul 15, 2023

Support onnx baudm/parseq#12

Closed

zerollzeng self-assigned this Jul 16, 2023

zerollzeng added the triaged Issue has been triaged by maintainers label Jul 16, 2023

ttyio closed this as completed Aug 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRt result is different from onnx model in parseq model #3136

TensorRt result is different from onnx model in parseq model #3136

keivanmoazami commented Jul 15, 2023

zerollzeng commented Jul 16, 2023

zerollzeng commented Jul 16, 2023

zerollzeng commented Jul 16, 2023

keivanmoazami commented Jul 16, 2023 •

edited

Loading

keivanmoazami commented Jul 16, 2023 •

edited

Loading

keivanmoazami commented Jul 25, 2023

nvpohanh commented Jul 26, 2023

keivanmoazami commented Jul 26, 2023

nvpohanh commented Jul 26, 2023

keivanmoazami commented Jul 26, 2023

nvpohanh commented Jul 26, 2023

keivanmoazami commented Jul 26, 2023 •

edited

Loading

nvpohanh commented Jul 27, 2023

keivanmoazami commented Jul 30, 2023

ttyio commented Aug 1, 2023

kino0924 commented Sep 30, 2023

lizhao7-tal commented Oct 18, 2023

PhilChina commented Oct 18, 2023

keivanmoazami commented Oct 18, 2023

ozhanatwork commented Mar 20, 2024

PhilCuriosity commented May 14, 2024

Sayyam-Jain commented Jul 18, 2024

lakshaypromact commented Dec 30, 2024

TensorRt result is different from onnx model in parseq model #3136

TensorRt result is different from onnx model in parseq model #3136

Comments

keivanmoazami commented Jul 15, 2023

Description

Environment

Relevant Files

Steps To Reproduce

zerollzeng commented Jul 16, 2023

zerollzeng commented Jul 16, 2023

zerollzeng commented Jul 16, 2023

keivanmoazami commented Jul 16, 2023 • edited Loading

keivanmoazami commented Jul 16, 2023 • edited Loading

keivanmoazami commented Jul 25, 2023

nvpohanh commented Jul 26, 2023

keivanmoazami commented Jul 26, 2023

nvpohanh commented Jul 26, 2023

keivanmoazami commented Jul 26, 2023

nvpohanh commented Jul 26, 2023

keivanmoazami commented Jul 26, 2023 • edited Loading

nvpohanh commented Jul 27, 2023

keivanmoazami commented Jul 30, 2023

ttyio commented Aug 1, 2023

kino0924 commented Sep 30, 2023

lizhao7-tal commented Oct 18, 2023

PhilChina commented Oct 18, 2023

keivanmoazami commented Oct 18, 2023

ozhanatwork commented Mar 20, 2024

PhilCuriosity commented May 14, 2024

Sayyam-Jain commented Jul 18, 2024

lakshaypromact commented Dec 30, 2024

keivanmoazami commented Jul 16, 2023 •

edited

Loading

keivanmoazami commented Jul 16, 2023 •

edited

Loading

keivanmoazami commented Jul 26, 2023 •

edited

Loading