-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YOLOv9-QAT TensorRT Q/DQ: Improved Speed and Zero Loss Accuracy #253
Comments
Added: Two repositories to test YOLOv9 QAT Models
|
Thanks for sharing, it would be better if there is export onnx for independent deployment, not just triton |
@levipereira It would be interesting to see how the performance on triton compares with Yolov7-QAT , since the paper does not talk about it and neither does #143 . |
@levipereira Thank you for your contribution. I need to ask a question, Do I have to train model in order to get a quantized model? |
@demuxin Yes. |
@trivedisarthak check OP |
The Original Implementation in #327 |
@levipereira how can I do quantization on yolov9 custom trained model? |
@levipereira Can I do QAT using MacBook M3? I am having alot of troubles QAT yolo model commands Getting this error: When I try to install: note: This error originates from a subprocess, and is likely not a problem with pip. × Getting requirements to build wheel did not run successfully. note: This error originates from a subprocess, and is likely not a problem with pip." |
This is outdated
follow this new repo
https://github.com/levipereira/yolov9-qat
Please follow The Original Implementation in #327
@WongKinYiu
I have developed the initial version of YOLOv9-QAT using the Q/DQ method, tailored specifically for YOLOv9 models intended for execution solely on TensorRT.
This implementation currently supports only the Inference Models (Converted and Gelan models).
The source code in available the yolov9-qat branch.
Challenges
Quantizing all layers in some cases can decreases accuracy and increases latency, primarily due to the complexity of the last layer. To mitigate this, utilize the
qat.py quantize --no-last-layer
flag to exclude the last layer from quantization.This version we have unoptimized scaling of Quantize/Dequantize (Q/DQ) could lead to generating unnecessary data formats. Implementing restrictions on the scale of Q/DQ on models/quantize.py to match the data format is essential to decrease latency perfomance.
The contributions from the community, as their knowledge is essential for the correct implementation of this functionality.
Files Added / Modified
qat.py - Main
models/quantize.py - Quantize Module
models/quantize_rules.py - Quantize Rules
export.py - Changed to Automatically detect QAT Models and Export when using flag
--include onnx / onnx_end2end
Accuracy Report
Result using TensorRT engine Models on Triton-Server
Tool: https://github.com/levipereira/triton-client-yolo
Latency Report
Table Info:
Origin
Last Layer not Quantized
All Layers Quantized
The text was updated successfully, but these errors were encountered: