Result of FP16 TensorRT model is NOT correct #3827

AndreWanga · 2024-04-26T09:39:38Z

Description

I tried to convert onnx to trt in FP16, the infering result of trt and onnx are so different

Environment

TensorRT Version: 8.6

NVIDIA GPU: RTX 3070

NVIDIA Driver Version: 531.18

CUDA Version: 12.1

CUDNN Version:

Operating System:

Python Version (if applicable): 3.8

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link: https://huggingface.co/BAAI/bge-reranker-large/tree/main

Steps To Reproduce

convert pytorch.bin to onnx:
optimum-cli export onnx -m bge-reranker-large output_bge --task text-classification --opset 17
convert onnx to trt:
trtexec --onnx=model.onnx --minShapes="input_ids":1x1,"attention_mask":1x1 --optShapes="input_ids":16x16,"attention_mask":16x16 --maxShapes="input_ids":32x32,"attention_mask":32x32 --fp16 --saveEngine=model.plan
use polygraphy to compare the results of onnx and trt model, the code is:

import tensorrt as trt
import numpy as np
from polygraphy.logger import G_LOGGER
G_LOGGER.module_severity = {'': G_LOGGER.VERBOSE}
from polygraphy.backend.onnxrt import OnnxrtRunner, SessionFromOnnx
from polygraphy.backend.trt import TrtRunner
from polygraphy.common import TensorMetadata
from polygraphy.comparator import Comparator, CompareFunc, DataLoader
from polygraphy.exception import PolygraphyException

data = TensorMetadata()
dtype = np.dtype(np.int32)
data.add('input_ids', dtype=dtype, shape=(4, 16))
data.add('attention_mask', dtype=dtype, shape=(4, 16))

data_loader = DataLoader(
    input_metadata=data
)
new_onnx_path = r"output_bge/model.onnx"
trt_path = r"output_bge/model.plan"
build_onnx_rt_session = SessionFromOnnx(new_onnx_path)
trt_logger = trt.Logger(trt.Logger.ERROR)
with open(trt_path, "rb") as f:
    engine_str = f.read()
deserialize_engine = trt.Runtime(trt_logger).deserialize_cuda_engine(engine_str)
runners = [
    OnnxrtRunner(build_onnx_rt_session),
    TrtRunner(deserialize_engine),
]

results = Comparator.run(runners, data_loader=data_loader)

success = True
compare_func = CompareFunc.simple(rtol={'': 5e-2}, atol={'': 5e-2})
success &= bool(Comparator.compare_accuracy(results, compare_func=compare_func))

if not success:
    raise PolygraphyException('FAILED')

result:

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

The text was updated successfully, but these errors were encountered:

lix19937 · 2024-04-27T04:53:47Z

You can upload the full result img, see the result of
Error Metrics: output and Relative Difference

AndreWanga · 2024-04-27T07:25:51Z

You can upload the full result img, see the result of Error Metrics: output and Relative Difference

here is the error metrics and relative dif img, and you can see the values of onnx and trt, they are total different from each other

@lix19937

lix19937 · 2024-04-27T13:55:52Z

Max abs diff is 1.0019.

You can use fp32 compare to see max abs diff

polygraphy run model.onnx --trt --onnxrt                   \
--trt-min-shapes input_ids:[1,1]   attention_mask:[1,1]    \
--trt-opt-shapes input_ids:[16,16] attention_mask:[16,16]  \
--trt-max-shapes input_ids:[32,32] attention_mask:[32,32]

then check fp16

polygraphy run model.onnx --trt --onnxrt  --fp16           \
--trt-min-shapes input_ids:[1,1]   attention_mask:[1,1]    \
--trt-opt-shapes input_ids:[16,16] attention_mask:[16,16]  \
--trt-max-shapes input_ids:[32,32] attention_mask:[32,32]

AndreWanga · 2024-04-27T15:28:27Z

I check the FP32 model with your command and my script, found that it will be pass when use your command, because the batch size of input is 1 , but it will fail when use my script , because the batch size is 4. So the reason should not be FP16, it seems like some operators are malfunctioning when convert this bge model from onnx to trt.

please check this. FYI
@lix19937

lix19937 · 2024-05-02T08:53:39Z

use the follow, set the batch size is 4

polygraphy run model.onnx --trt --onnxrt                   \
--trt-min-shapes input_ids:[1,1]   attention_mask:[1,1]    \
--trt-opt-shapes input_ids:[16,16] attention_mask:[16,16]  \
--trt-max-shapes input_ids:[32,32] attention_mask:[32,32]  \
--input-shapes   input_ids:[4,4]   attention_mask:[4,4]

AndreWanga · 2024-05-08T02:08:13Z

use the follow, set the batch size is 4

polygraphy run model.onnx --trt --onnxrt                   \
--trt-min-shapes input_ids:[1,1]   attention_mask:[1,1]    \
--trt-opt-shapes input_ids:[16,16] attention_mask:[16,16]  \
--trt-max-shapes input_ids:[32,32] attention_mask:[32,32]  \
--input-shapes   input_ids:[4,4]   attention_mask:[4,4]

The result is still failed, same as my script @lix19937

AndreWanga · 2024-05-27T16:10:09Z

Is there any solution about this issue? FYI @lix19937 @zerollzeng

lix19937 · 2024-05-28T00:14:30Z

Just fixed shape to debug

polygraphy run model.onnx --trt --onnxrt                   \
--trt-min-shapes input_ids:[4,4]   attention_mask:[4,4]    \
--trt-opt-shapes input_ids:[4,4]   attention_mask:[4,4]  \
--trt-max-shapes input_ids:[4,4]   attention_mask:[4,4]  \
--input-shapes   input_ids:[4,4]   attention_mask:[4,4]

if not expected, compare each layer

polygraphy run model.onnx --trt --onnxrt --fp16 \
     --trt-outputs mark all \
     --onnx-outputs mark all

AndreWanga · 2024-05-28T07:29:05Z

Just fixed shape to debug

polygraphy run model.onnx --trt --onnxrt                   \
--trt-min-shapes input_ids:[4,4]   attention_mask:[4,4]    \
--trt-opt-shapes input_ids:[4,4]   attention_mask:[4,4]  \
--trt-max-shapes input_ids:[4,4]   attention_mask:[4,4]  \
--input-shapes   input_ids:[4,4]   attention_mask:[4,4]

if not expected, compare each layer

polygraphy run model.onnx --trt --onnxrt --fp16 \
     --trt-outputs mark all \
     --onnx-outputs mark all

I try your cmd, using mark all, but error occur while running:

@lix19937

AndreWanga · 2024-05-29T05:21:18Z

could you please try to convert this bge model and compare the results? I try all the ways I know but cant fix this issue. I think maybe tensorrt cant handle this crossencoder model now. FYI @lix19937 @zerollzeng

AndreWanga · 2024-05-30T17:28:29Z

I use Tensorrt 10 and the issue has been dealed with, thanks for your time. @lix19937 @zerollzeng

zerollzeng added the triaged Issue has been triaged by maintainers label Apr 28, 2024

AndreWanga closed this as completed May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Result of FP16 TensorRT model is NOT correct #3827

Result of FP16 TensorRT model is NOT correct #3827

AndreWanga commented Apr 26, 2024 •

edited

Loading

lix19937 commented Apr 27, 2024 •

edited

Loading

AndreWanga commented Apr 27, 2024 •

edited

Loading

lix19937 commented Apr 27, 2024

AndreWanga commented Apr 27, 2024 •

edited

Loading

lix19937 commented May 2, 2024

AndreWanga commented May 8, 2024

AndreWanga commented May 27, 2024

lix19937 commented May 28, 2024

AndreWanga commented May 28, 2024

AndreWanga commented May 29, 2024

AndreWanga commented May 30, 2024

Result of FP16 TensorRT model is NOT correct #3827

Result of FP16 TensorRT model is NOT correct #3827

Comments

AndreWanga commented Apr 26, 2024 • edited Loading

Description

Environment

Relevant Files

Steps To Reproduce

lix19937 commented Apr 27, 2024 • edited Loading

AndreWanga commented Apr 27, 2024 • edited Loading

lix19937 commented Apr 27, 2024

AndreWanga commented Apr 27, 2024 • edited Loading

lix19937 commented May 2, 2024

AndreWanga commented May 8, 2024

AndreWanga commented May 27, 2024

lix19937 commented May 28, 2024

AndreWanga commented May 28, 2024

AndreWanga commented May 29, 2024

AndreWanga commented May 30, 2024

AndreWanga commented Apr 26, 2024 •

edited

Loading

lix19937 commented Apr 27, 2024 •

edited

Loading

AndreWanga commented Apr 27, 2024 •

edited

Loading

AndreWanga commented Apr 27, 2024 •

edited

Loading