Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result of FP16 TensorRT model is NOT correct #3827

Closed
AndreWanga opened this issue Apr 26, 2024 · 11 comments
Closed

Result of FP16 TensorRT model is NOT correct #3827

AndreWanga opened this issue Apr 26, 2024 · 11 comments
Labels
triaged Issue has been triaged by maintainers

Comments

@AndreWanga
Copy link

AndreWanga commented Apr 26, 2024

Description

I tried to convert onnx to trt in FP16, the infering result of trt and onnx are so different

Environment

TensorRT Version: 8.6

NVIDIA GPU: RTX 3070

NVIDIA Driver Version: 531.18

CUDA Version: 12.1

CUDNN Version:

Operating System:

Python Version (if applicable): 3.8

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link: https://huggingface.co/BAAI/bge-reranker-large/tree/main

Steps To Reproduce

  1. convert pytorch.bin to onnx:
    optimum-cli export onnx -m bge-reranker-large output_bge --task text-classification --opset 17
  2. convert onnx to trt:
    trtexec --onnx=model.onnx --minShapes="input_ids":1x1,"attention_mask":1x1 --optShapes="input_ids":16x16,"attention_mask":16x16 --maxShapes="input_ids":32x32,"attention_mask":32x32 --fp16 --saveEngine=model.plan
  3. use polygraphy to compare the results of onnx and trt model, the code is:
import tensorrt as trt
import numpy as np
from polygraphy.logger import G_LOGGER
G_LOGGER.module_severity = {'': G_LOGGER.VERBOSE}
from polygraphy.backend.onnxrt import OnnxrtRunner, SessionFromOnnx
from polygraphy.backend.trt import TrtRunner
from polygraphy.common import TensorMetadata
from polygraphy.comparator import Comparator, CompareFunc, DataLoader
from polygraphy.exception import PolygraphyException

data = TensorMetadata()
dtype = np.dtype(np.int32)
data.add('input_ids', dtype=dtype, shape=(4, 16))
data.add('attention_mask', dtype=dtype, shape=(4, 16))

data_loader = DataLoader(
    input_metadata=data
)
new_onnx_path = r"output_bge/model.onnx"
trt_path = r"output_bge/model.plan"
build_onnx_rt_session = SessionFromOnnx(new_onnx_path)
trt_logger = trt.Logger(trt.Logger.ERROR)
with open(trt_path, "rb") as f:
    engine_str = f.read()
deserialize_engine = trt.Runtime(trt_logger).deserialize_cuda_engine(engine_str)
runners = [
    OnnxrtRunner(build_onnx_rt_session),
    TrtRunner(deserialize_engine),
]

results = Comparator.run(runners, data_loader=data_loader)

success = True
compare_func = CompareFunc.simple(rtol={'': 5e-2}, atol={'': 5e-2})
success &= bool(Comparator.compare_accuracy(results, compare_func=compare_func))

if not success:
    raise PolygraphyException('FAILED')
  1. result:
    捕获

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

@lix19937
Copy link

lix19937 commented Apr 27, 2024

You can upload the full result img, see the result of
Error Metrics: output and Relative Difference

@AndreWanga
Copy link
Author

AndreWanga commented Apr 27, 2024

You can upload the full result img, see the result of Error Metrics: output and Relative Difference

捕获

here is the error metrics and relative dif img, and you can see the values of onnx and trt, they are total different from each other

@lix19937

@lix19937
Copy link

Max abs diff is 1.0019.

You can use fp32 compare to see max abs diff

polygraphy run model.onnx --trt --onnxrt                   \
--trt-min-shapes input_ids:[1,1]   attention_mask:[1,1]    \
--trt-opt-shapes input_ids:[16,16] attention_mask:[16,16]  \
--trt-max-shapes input_ids:[32,32] attention_mask:[32,32]

then check fp16

polygraphy run model.onnx --trt --onnxrt  --fp16           \
--trt-min-shapes input_ids:[1,1]   attention_mask:[1,1]    \
--trt-opt-shapes input_ids:[16,16] attention_mask:[16,16]  \
--trt-max-shapes input_ids:[32,32] attention_mask:[32,32]

@AndreWanga
Copy link
Author

AndreWanga commented Apr 27, 2024

I check the FP32 model with your command and my script, found that it will be pass when use your command, because the batch size of input is 1 , but it will fail when use my script , because the batch size is 4. So the reason should not be FP16, it seems like some operators are malfunctioning when convert this bge model from onnx to trt.
捕获
please check this. FYI
@lix19937

@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label Apr 28, 2024
@lix19937
Copy link

lix19937 commented May 2, 2024

use the follow, set the batch size is 4

polygraphy run model.onnx --trt --onnxrt                   \
--trt-min-shapes input_ids:[1,1]   attention_mask:[1,1]    \
--trt-opt-shapes input_ids:[16,16] attention_mask:[16,16]  \
--trt-max-shapes input_ids:[32,32] attention_mask:[32,32]  \
--input-shapes   input_ids:[4,4]   attention_mask:[4,4]

@AndreWanga
Copy link
Author

use the follow, set the batch size is 4

polygraphy run model.onnx --trt --onnxrt                   \
--trt-min-shapes input_ids:[1,1]   attention_mask:[1,1]    \
--trt-opt-shapes input_ids:[16,16] attention_mask:[16,16]  \
--trt-max-shapes input_ids:[32,32] attention_mask:[32,32]  \
--input-shapes   input_ids:[4,4]   attention_mask:[4,4]

The result is still failed, same as my script @lix19937

@AndreWanga
Copy link
Author

Is there any solution about this issue? FYI @lix19937 @zerollzeng

@lix19937
Copy link

Just fixed shape to debug

polygraphy run model.onnx --trt --onnxrt                   \
--trt-min-shapes input_ids:[4,4]   attention_mask:[4,4]    \
--trt-opt-shapes input_ids:[4,4]   attention_mask:[4,4]  \
--trt-max-shapes input_ids:[4,4]   attention_mask:[4,4]  \
--input-shapes   input_ids:[4,4]   attention_mask:[4,4]

if not expected, compare each layer

polygraphy run model.onnx --trt --onnxrt --fp16 \
     --trt-outputs mark all \
     --onnx-outputs mark all

@AndreWanga
Copy link
Author

Just fixed shape to debug

polygraphy run model.onnx --trt --onnxrt                   \
--trt-min-shapes input_ids:[4,4]   attention_mask:[4,4]    \
--trt-opt-shapes input_ids:[4,4]   attention_mask:[4,4]  \
--trt-max-shapes input_ids:[4,4]   attention_mask:[4,4]  \
--input-shapes   input_ids:[4,4]   attention_mask:[4,4]

if not expected, compare each layer

polygraphy run model.onnx --trt --onnxrt --fp16 \
     --trt-outputs mark all \
     --onnx-outputs mark all

I try your cmd, using mark all, but error occur while running:
屏幕截图 2024-05-28 152822

@lix19937

@AndreWanga
Copy link
Author

could you please try to convert this bge model and compare the results? I try all the ways I know but cant fix this issue. I think maybe tensorrt cant handle this crossencoder model now. FYI @lix19937 @zerollzeng

@AndreWanga
Copy link
Author

I use Tensorrt 10 and the issue has been dealed with, thanks for your time. @lix19937 @zerollzeng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants