I used pytorch-quantization to perform PTQ int8 quantization on ResNet50 and exported it to onnx, followed by exporting it to engine. trt. When reasoning, I found that the speed did not increase, but instead slowed down. What went wrong. #4304

jishenghuang · 2024-12-30T02:54:11Z

Description

Environment

TensorRT Version: 10.7

NVIDIA GPU: rtx3090

NVIDIA Driver Version:

CUDA Version: 11.7

CUDNN Version:

Operating System:

Python Version (if applicable): 3.10

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

The text was updated successfully, but these errors were encountered:

lix19937 · 2025-01-01T14:00:32Z

Can u upload the onnx ?

jishenghuang · 2025-01-06T01:51:51Z

[I exported both quantified and unquantified models based on fixed and dynamic batches, and found that the inference speed did not increase. Here is the onnx model I exported.
1.Fixed Batch:
Quantified:

Without quantification:

2.Dynamic Batch:
Quantified:

Without quantification:

Unable to upload onnx model, use screenshot instead. If necessary, I can add your contact information and send you these onnx models

lix19937 · 2025-01-13T05:27:34Z

If necessary, I can add your contact information and send you these onnx models

lix19937@126.com

jishenghuang · 2025-01-13T05:44:43Z

If necessary, I can add your contact information and send you these onnx models

lix19937@126.com

你也是中国的吧，等会给你发邮件发过去。

LeoZDong assigned nvpohanh Feb 11, 2025

LeoZDong added Module:Performance General performance issues triaged Issue has been triaged by maintainers labels Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I used pytorch-quantization to perform PTQ int8 quantization on ResNet50 and exported it to onnx, followed by exporting it to engine. trt. When reasoning, I found that the speed did not increase, but instead slowed down. What went wrong. #4304

I used pytorch-quantization to perform PTQ int8 quantization on ResNet50 and exported it to onnx, followed by exporting it to engine. trt. When reasoning, I found that the speed did not increase, but instead slowed down. What went wrong. #4304

jishenghuang commented Dec 30, 2024

lix19937 commented Jan 1, 2025

jishenghuang commented Jan 6, 2025

lix19937 commented Jan 13, 2025

jishenghuang commented Jan 13, 2025

I used pytorch-quantization to perform PTQ int8 quantization on ResNet50 and exported it to onnx, followed by exporting it to engine. trt. When reasoning, I found that the speed did not increase, but instead slowed down. What went wrong. #4304

I used pytorch-quantization to perform PTQ int8 quantization on ResNet50 and exported it to onnx, followed by exporting it to engine. trt. When reasoning, I found that the speed did not increase, but instead slowed down. What went wrong. #4304

Comments

jishenghuang commented Dec 30, 2024

Description

Environment

Relevant Files

Steps To Reproduce

lix19937 commented Jan 1, 2025

jishenghuang commented Jan 6, 2025

lix19937 commented Jan 13, 2025

jishenghuang commented Jan 13, 2025