Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support standard ONNX quantized operators? #9

Closed
Nullkooland opened this issue Oct 4, 2021 · 3 comments
Closed

Support standard ONNX quantized operators? #9

Nullkooland opened this issue Oct 4, 2021 · 3 comments

Comments

@Nullkooland
Copy link

Hi, I've been looking for an end2end quantization deployment solution, so far I've tried the onnxrutime + TVM stack, but onnxruntime only supports naive PTQ methods. Your work looks really promising, however, I wonder can MQBench export the quantized model in the form of standard ONNX quantized ops like QLinearConv, QuantizeLinear, DequantizeLinear, etc?

See: apache/tvm#8838

@Tracin
Copy link
Contributor

Tracin commented Oct 8, 2021

We consider deploying quantized model on TVM important, deploying on TVM with QuantDequant node will be supported very soon. Actually we have already done this in PTQ scheme for experiments.

@Nullkooland
Copy link
Author

@Tracin
I noticed that the v0.0.3 version has ONNX QNN Ops support, however, I cannot export an ONNX QNN model when running the example code in test_quantize_onnxqnn in test_backend.py:

No Op registered for LearnablePerTensorAffine with domain_version of 11

It seems that ONNX does not support the custom fake quantize Op LearnablePerTensorAffine in the quantized model. I also noticed that there's a function deploy_qparams_tvm which uses the ONNXQNNPass, in which fake quantize Ops are replaced with standard ONNX QNN Ops, but this function is not called during convert_deploy. Is there any documentations explaining how to use these utilities to export to a standard ONNX QNN model?

@Tracin
Copy link
Contributor

Tracin commented Nov 5, 2021

ONNXQNNPass is registered in scheme ONNX_QNN, you can prepare and covert_delpoy using backend=ONNX_QNN, then it will be called.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants