-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Result of FP16 TensorRT model is NOT correct #3827
Comments
You can upload the full result img, see the result of |
here is the error metrics and relative dif img, and you can see the values of onnx and trt, they are total different from each other |
Max abs diff is 1.0019. You can use fp32 compare to see polygraphy run model.onnx --trt --onnxrt \
--trt-min-shapes input_ids:[1,1] attention_mask:[1,1] \
--trt-opt-shapes input_ids:[16,16] attention_mask:[16,16] \
--trt-max-shapes input_ids:[32,32] attention_mask:[32,32] then check fp16 polygraphy run model.onnx --trt --onnxrt --fp16 \
--trt-min-shapes input_ids:[1,1] attention_mask:[1,1] \
--trt-opt-shapes input_ids:[16,16] attention_mask:[16,16] \
--trt-max-shapes input_ids:[32,32] attention_mask:[32,32] |
I check the FP32 model with your command and my script, found that it will be pass when use your command, because the batch size of input is 1 , but it will fail when use my script , because the batch size is 4. So the reason should not be FP16, it seems like some operators are malfunctioning when convert this bge model from onnx to trt. |
use the follow, set the batch size is 4 polygraphy run model.onnx --trt --onnxrt \
--trt-min-shapes input_ids:[1,1] attention_mask:[1,1] \
--trt-opt-shapes input_ids:[16,16] attention_mask:[16,16] \
--trt-max-shapes input_ids:[32,32] attention_mask:[32,32] \
--input-shapes input_ids:[4,4] attention_mask:[4,4] |
The result is still failed, same as my script @lix19937 |
Is there any solution about this issue? FYI @lix19937 @zerollzeng |
Just fixed shape to debug polygraphy run model.onnx --trt --onnxrt \
--trt-min-shapes input_ids:[4,4] attention_mask:[4,4] \
--trt-opt-shapes input_ids:[4,4] attention_mask:[4,4] \
--trt-max-shapes input_ids:[4,4] attention_mask:[4,4] \
--input-shapes input_ids:[4,4] attention_mask:[4,4] if not expected, compare each layer polygraphy run model.onnx --trt --onnxrt --fp16 \
--trt-outputs mark all \
--onnx-outputs mark all |
could you please try to convert this bge model and compare the results? I try all the ways I know but cant fix this issue. I think maybe tensorrt cant handle this crossencoder model now. FYI @lix19937 @zerollzeng |
I use Tensorrt 10 and the issue has been dealed with, thanks for your time. @lix19937 @zerollzeng |
Description
I tried to convert onnx to trt in FP16, the infering result of trt and onnx are so different
Environment
TensorRT Version: 8.6
NVIDIA GPU: RTX 3070
NVIDIA Driver Version: 531.18
CUDA Version: 12.1
CUDNN Version:
Operating System:
Python Version (if applicable): 3.8
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link: https://huggingface.co/BAAI/bge-reranker-large/tree/main
Steps To Reproduce
optimum-cli export onnx -m bge-reranker-large output_bge --task text-classification --opset 17
trtexec --onnx=model.onnx --minShapes="input_ids":1x1,"attention_mask":1x1 --optShapes="input_ids":16x16,"attention_mask":16x16 --maxShapes="input_ids":32x32,"attention_mask":32x32 --fp16 --saveEngine=model.plan
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
):The text was updated successfully, but these errors were encountered: