-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected exception _Map_base::at during of TensorRT 8.6.3 when running INT8-calibration on GPU RTX 4090 #3837
Comments
Your ptq takes so little time ? |
No, the PTQ calibration usually takes around 40s. Here is an example log: Calibration data consists of 4500 images with a calibration batch size of 500. In this case, we simply want to try to benchmark our model when we quantize only one type layer type at a time. In our case, Convolutional and Point-wise. We got these layer types from profiling the layers using trtexec, and then analysing the latency using trt-engine explorer. I've linked a Google Drive folder https://drive.google.com/drive/folders/1MJAP7NDO7zzRJlUJFexpTcxKVWT9tnuP?usp=drive_link with the files that are concerned. |
I think you can calibrate without set the layer precision, just generate scale for all layers and get the calibration cache, then use the calibration cache, fallback some layers to FP32/FP16. Or use QAT, you can control the layer precision with Q/DQ pairs explicitly. |
I'm not sure if I am following. Our current approach is a modified version of the sample script: https://github.com/NVIDIA/TensorRT/blob/release/10.0/samples/python/efficientdet/build_engine.py. If I want to control the precision of layers based on their type, can I use implicit quantization while setting layer precision as done in my code?
|
Description
We are trying to quantize a specific layer-type for an engine:
{trt.LayerType.CONVOLUTION : trt.DataType.INT8}
In the log.txt, we show that we identify and set the precision of layers with convolution type. In this way:
Our previous mixed-precision strategy when considering "blocks" of the network did not stumble upon this error. Only when we did mixed-precision based on the layer type.
Environment
Baremetal or Container (if so, version): tensorrt-24.0.3py
Relevant Files
log.txt
The text was updated successfully, but these errors were encountered: