Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRt result is different from onnx model in parseq model #3136

Closed
keivanmoazami opened this issue Jul 15, 2023 · 23 comments
Closed

TensorRt result is different from onnx model in parseq model #3136

keivanmoazami opened this issue Jul 15, 2023 · 23 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@keivanmoazami
Copy link

Description

I converted parseq ocr model from pytorch to onnx and tested it on onnx model and every thing is ok, but when I convert onnx to fp32 or fp16 tensorrt engine, output of the model is very different from onnx model.
I use onnsim to simplify onnx. if i dont use onnxsim all results are nan.

model repo : https://github.com/baudm/parseq

Environment

TensorRT Version: TensorRT-8.6.1.6

NVIDIA GPU: RTX 3060

NVIDIA Driver Version: 531.79

CUDA Version: cuda-12.0

CUDNN Version:cudnn-8.9.1.23_cuda12

Operating System: Win 10

Python Version: 3.8

PyTorch Version: 1.13

Onnx opset : 14

Relevant Files

onnx model: https://drive.google.com/file/d/1CRXsD8Zk5Mo50JYCZytrAtBbFm2oOqvc/view?usp=sharing

trtexec.exe --onnx=parseq/test.onnx --workspace=10000 --saveEngine=parseq/test_fp32.trs --verbose
trt engine fp32: https://drive.google.com/file/d/17eecl4QrRrE1BiLqDE8HJT0wZCVm3BkB/view?usp=sharing
trt engine fp32 log: https://drive.google.com/file/d/1i9KkbKainaNIz5QQvolmScIu53DzFHHv/view?usp=sharing

trtexec.exe --onnx=parseq/test.onnx --fp16 --workspace=10000 --saveEngine=parseq/test_fp16.trs --verbose
trt engine fp16: https://drive.google.com/file/d/1CIzRZ-71a2hXZWnMNtWn7k2tuM3Pi6K_/view?usp=sharing
trt engine fp16 log: https://drive.google.com/file/d/15LOBtarM6RZiiyZaz66qt6Z8nu67JyrN/view?usp=sharing

Steps To Reproduce

I wrote a sample code to compare similarity of onnx and trt inference result. when I use real data, mean of similarity is 0.3 and when I use random number it is near 0.85

sample code:
https://drive.google.com/file/d/1dLo9iD3ZUPVuvU6LNFnwQSCjcLDTiKQr/view?usp=sharing
sample real data:
https://drive.google.com/file/d/1VtQgOYw5ZYQSZmUOGyJ7xPKElC7caFMl/view?usp=sharing

@keivanmoazami keivanmoazami changed the title TensortRt result is different from onnx model in parseq model TensorRt result is different from onnx model in parseq model Jul 15, 2023
@zerollzeng
Copy link
Collaborator

I can reproduce the issue with TRT 8.6.1, but looks like the issue has been fixed in our internal latest code(commit id 94d9acac4e3), please wait for the new release.

[I]         Error Metrics: output
[I]             Minimum Required Tolerance: elemwise error | [abs=9.8348e-07] OR [rel=8.7517e-06] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=2.5298e-08, std-dev=7.5773e-08, var=5.7415e-15, median=3.7835e-10, min=0 at (0, 0, 1), max=9.8348e-07 at (0, 5, 0), avg-magnitude=2.5298e-08
[I]             Relative Difference | Stats: mean=1.509e-06, std-dev=1.2058e-06, var=1.454e-12, median=1.2684e-06, min=0 at (0, 0, 1), max=8.7517e-06 at (0, 5, 32), avg-magnitude=1.509e-06
[I]         PASSED | Output: 'output' | Difference is within tolerance (rel=1e-05, abs=1e-05)
[I]     PASSED | All outputs matched | Outputs: ['output']
[I] Accuracy Summary | trt-runner-N0-07/16/23-13:49:45 vs. onnxrt-runner-N0-07/16/23-13:49:45 | Passed: 1/1 iterations | Pass Rate: 100.0%
[I] PASSED | Runtime: 97.785s | Command: /home/zeroz/.local/bin/polygraphy run test.onnx --trt --onnxrt

cc @nvpohanh who may know the fix info.

@zerollzeng
Copy link
Collaborator

8.6.1

[I]     Comparing Output: 'output' (dtype=float32, shape=(1, 9, 45)) with 'output' (dtype=float32, shape=(1, 9, 45))
[I]         Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I]         trt-runner-N0-07/16/23-13:55:51: output | Stats: mean=0.022222, std-dev=0.1051, var=0.011047, median=6.1031e-06, min=1.262e-12 at (0, 8, 39), max=1 at (0, 8, 0), avg-magnitude=0.022222
[I]             ---- Histogram ----
                Bin Range       |  Num Elems | Visualization
                (1.26e-12, 0.1) |        387 | ########################################
                (0.1     , 0.2) |          6 | 
                (0.2     , 0.3) |          3 | 
                (0.3     , 0.4) |          0 | 
                (0.4     , 0.5) |          1 | 
                (0.5     , 0.6) |          2 | 
                (0.6     , 0.7) |          2 | 
                (0.7     , 0.8) |          2 | 
                (0.8     , 0.9) |          1 | 
                (0.9     , 1  ) |          1 | 
[I]         onnxrt-runner-N0-07/16/23-13:55:51: output | Stats: mean=0.022222, std-dev=0.066196, var=0.004382, median=0.00044518, min=4.3574e-10 at (0, 8, 28), max=0.99999 at (0, 8, 0), avg-magnitude=0.022222
[I]             ---- Histogram ----
                Bin Range       |  Num Elems | Visualization
                (1.26e-12, 0.1) |        382 | ########################################
                (0.1     , 0.2) |         16 | #
                (0.2     , 0.3) |          4 | 
                (0.3     , 0.4) |          2 | 
                (0.4     , 0.5) |          0 | 
                (0.5     , 0.6) |          0 | 
                (0.6     , 0.7) |          0 | 
                (0.7     , 0.8) |          0 | 
                (0.8     , 0.9) |          0 | 
                (0.9     , 1  ) |          1 | 
[I]         Error Metrics: output
[I]             Minimum Required Tolerance: elemwise error | [abs=0.79589] OR [rel=84.81] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=0.028243, std-dev=0.087299, var=0.0076211, median=0.00038333, min=4.143e-10 at (0, 8, 30), max=0.79589 at (0, 0, 3), avg-magnitude=0.028243
[I]                 ---- Histogram ----
                    Bin Range          |  Num Elems | Visualization
                    (4.14e-10, 0.0796) |        369 | ########################################
                    (0.0796  , 0.159 ) |         20 | ##
                    (0.159   , 0.239 ) |          6 | 
                    (0.239   , 0.318 ) |          2 | 
                    (0.318   , 0.398 ) |          2 | 
                    (0.398   , 0.478 ) |          1 | 
                    (0.478   , 0.557 ) |          2 | 
                    (0.557   , 0.637 ) |          1 | 
                    (0.637   , 0.716 ) |          0 | 
                    (0.716   , 0.796 ) |          2 | 
[I]             Relative Difference | Stats: mean=1.4284, std-dev=4.9882, var=24.882, median=0.98306, min=8.9408e-06 at (0, 8, 0), max=84.81 at (0, 0, 3), avg-magnitude=1.4284
[I]                 ---- Histogram ----
                    Bin Range        |  Num Elems | Visualization
                    (8.94e-06, 8.48) |        401 | ########################################
                    (8.48    , 17  ) |          1 | 
                    (17      , 25.4) |          1 | 
                    (25.4    , 33.9) |          0 | 
                    (33.9    , 42.4) |          0 | 
                    (42.4    , 50.9) |          1 | 
                    (50.9    , 59.4) |          0 | 
                    (59.4    , 67.8) |          0 | 
                    (67.8    , 76.3) |          0 | 
                    (76.3    , 84.8) |          1 | 
[E]         FAILED | Output: 'output' | Difference exceeds tolerance (rel=1e-05, abs=1e-05)
[E]     FAILED | Mismatched outputs: ['output']
[E] Accuracy Summary | trt-runner-N0-07/16/23-13:55:51 vs. onnxrt-runner-N0-07/16/23-13:55:51 | Passed: 0/1 iterations | Pass Rate: 0.0%
[E] FAILED | Runtime: 27.728s | Command: /home/scratch.zeroz_sw/miniconda3/bin/polygraphy run test.onnx --trt --onnxrt

@zerollzeng
Copy link
Collaborator

I suspect this was caused by accumulated numerical error since I see there are a lot of transformer blocks.

@zerollzeng zerollzeng self-assigned this Jul 16, 2023
@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label Jul 16, 2023
@keivanmoazami
Copy link
Author

keivanmoazami commented Jul 16, 2023

I have same problem with VitStr based on timm vision transformer

VitStr:
https://github.com/roatienza/deep-text-recognition-benchmark/tree/master

Vision transformer:
https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/vision_transformer.py

It seems that the problem is related to vision transformer.

@keivanmoazami
Copy link
Author

keivanmoazami commented Jul 16, 2023

I can reproduce the issue with TRT 8.6.1, but looks like the issue has been fixed in our internal latest code(commit id 94d9acac4e3), please wait for the new release.

[I]         Error Metrics: output
[I]             Minimum Required Tolerance: elemwise error | [abs=9.8348e-07] OR [rel=8.7517e-06] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=2.5298e-08, std-dev=7.5773e-08, var=5.7415e-15, median=3.7835e-10, min=0 at (0, 0, 1), max=9.8348e-07 at (0, 5, 0), avg-magnitude=2.5298e-08
[I]             Relative Difference | Stats: mean=1.509e-06, std-dev=1.2058e-06, var=1.454e-12, median=1.2684e-06, min=0 at (0, 0, 1), max=8.7517e-06 at (0, 5, 32), avg-magnitude=1.509e-06
[I]         PASSED | Output: 'output' | Difference is within tolerance (rel=1e-05, abs=1e-05)
[I]     PASSED | All outputs matched | Outputs: ['output']
[I] Accuracy Summary | trt-runner-N0-07/16/23-13:49:45 vs. onnxrt-runner-N0-07/16/23-13:49:45 | Passed: 1/1 iterations | Pass Rate: 100.0%
[I] PASSED | Runtime: 97.785s | Command: /home/zeroz/.local/bin/polygraphy run test.onnx --trt --onnxrt

cc @nvpohanh who may know the fix info.

@nvpohanh
thanks for your response. when do you think the new version will be released ?

@keivanmoazami
Copy link
Author

Can you give me an approximate time for the next release?
This is vital for us. Is it possible to access the unpublished version of the code?
Thanks.
@zerollzeng @nvpohanh

@nvpohanh
Copy link
Collaborator

With TRT 8.6, could you try exporting the ONNX model with opset17 or above so that the LayerNorms use the LayerNormalization ONNX operators rather than a bunch of ReduceMean ops and other pointwise ops?

I don't have an ETA for next TRT release yet.

@keivanmoazami
Copy link
Author

The same problem remains With TRT 8.6 and opset 17.

onnx model with opset 17:
https://drive.google.com/file/d/1La7tGKEqAwaca_FaBYH91ibhhr52l0kW/view?usp=sharing

@nvpohanh
Copy link
Collaborator

2023-07-26 16_19_02-Window
Hmm if the LayerNorm is already an ONNX op, then the next possibility would be the MHA part. Could you experiment it by adding a Cast (to float32) before MHA and another Cast to FP16 after MHA and see if that helps the accuracy? If it does, that at least tells us something

@keivanmoazami
Copy link
Author

Thanks for your advise. I add cast layers to 11 encoder blocks. Trt version of model work better but onnx checker not pass validation. I got cos similarity between torch model result and trt model and similarity was 0.9578
Is it necessary to add cast layers to decoder blocks too for achieve better result ?

modified onnx model :
https://drive.google.com/file/d/12UwIFv8LKL5GOyQA6jPzT8eCq39GYxpk/view?usp=sharing

@nvpohanh
Copy link
Collaborator

could you give it a try and see if it solves the e2e accuracy issue? I am just wondering if the MHA part is the issue, or if there are other issues.

I am trying this because in the next TRT version, we have some heuristics to force some MatMuls in MHA to run in FP32 and I wonder if that explains why @zerollzeng was able to get better accuracy with the internal version of TRT.

@keivanmoazami
Copy link
Author

keivanmoazami commented Jul 26, 2023

I run torch and trt model on test data set and the result is
unmodified onnx acc: 0.9971
unmodified torch acc: 0.9975
unmodified trt acc: 0.0156
modified trt acc: 0.9136

do you have any idea how can achieve better result ?

@nvpohanh
Copy link
Collaborator

I see. So adding Casts did recover the accuracy to some extent, but not fully.

Several more things to experiment with:

  • Add Casts around LayerNormalization ops and see if that helps.
  • Add Cast (to FP32) before the last Languange-Model Head MatMul (the very last MatMul in the network)

The more Casts added, the slower it gets, but the better accuracy it results in. So the task is to find out which layers are sensitive to FP16 precision the most and to run those layers in FP32

@keivanmoazami
Copy link
Author

I have a mistake in preprocessing stage. I add cast layers to 11 encoder blocks. Trt version of model work exactly same as original torch model.
Thanks

@ttyio
Copy link
Collaborator

ttyio commented Aug 1, 2023

closing since it is solved, thanks all!

@ttyio ttyio closed this as completed Aug 1, 2023
@kino0924
Copy link

NGC 23.09 is still 8.6.1.6
Any plans for next release of TensorRT?

@lizhao7-tal
Copy link

I have a mistake in preprocessing stage. I add cast layers to 11 encoder blocks. Trt version of model work exactly same as original torch model. Thanks

Have you ever tried dynamic input? Can you support it?

@PhilChina
Copy link

cast layers? pytorch model or onnx model?

@keivanmoazami
Copy link
Author

I use onnx-modifier
https://github.com/ZhangGe6/onnx-modifier

@ozhanatwork
Copy link

Can anyone share how to load .engine weights for inference

@PhilCuriosity
Copy link

Can anyone share how to load .engine weights for inference

https://github.com/fabio-sim/LightGlue-ONNX/blob/main/trt_infer.py

@Sayyam-Jain
Copy link

Hi, Is this issue fixed with latest TensorRT (10.x.x)?

@lakshaypromact
Copy link

Hi , is there any drive link which is working right now as all the above are not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests