-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QuantizeLinear
and DequantizeLinear
nodes for Conv2D
get folded as constants.
#1719
Comments
Ah I see, we have two constant fold optimizers (one for pre-conversion TF, other post-conversion ONNX). Our onnx constant folding optimizer exempts Dequantize ops for the reason you describe:
But the TF one does not: tensorflow-onnx/tf2onnx/tf_utils.py Line 219 in 496b65d
|
@TomWildenhain-Microsoft adding tensorflow-onnx/tf2onnx/tf_utils.py Line 219 in 496b65d
doesn't seem to help! |
Can you try using the |
We run all graphs (except large models) through the tf optimizer which might be having this effect. As a hack for testing, try adding the |
The graph is smaller since the zip contains the weights externally. That's what the tensorflow-onnx/tf2onnx/tf_utils.py Line 219 in 496b65d
Did you add it for that test? |
Thanks, I tried adding tensorflow-onnx/tf2onnx/tf_utils.py Line 219 in 496b65d
--large_model flag. This got it to work. Another way i got it to work was, instead of enabling the --large_model flag, I manually disabled tf_optimize_grappler and it still worksBut generated qdq onnx model doesn't seem to work with TensorRT (throws |
This should be solvable by adding QuantizeLinear to the const fold optimizer. Then the Dequantize will be pushed through the transpose here:
|
Also, IIUC, ConstDequantizeOptimizer will convert (w => Q => DQ => Transpose) to (w => Q => Transpose => DQ) this pattern is problematic with TensorRT. I was trying to convert |
The
To do the above, you will need post-conversion constant folding for the Quantize op. post-conversion constant folding requires a numpy implementation of the op. So yes, it should be added to that file but not that line. Will the |
To clarify here, the fusion pattern as I understand it should match so long as the DQ op on the weight kernel is not folded. Folding the Q should be fine. Note how in your image
It would be possible to do this with conv. I don't know if it would be easier than adding the Quantize op to constant folding. |
Setting grappler config tensorflow-onnx/tf2onnx/tf_loader.py Lines 680 to 682 in 4245d8d
There are few other cases, where the inputs are not constants, so I was hoping to arrive at a more generalized solution for this. Is there a way to always add tensor shape manipulation ops before Since Quantize ops are always immediately followed by DeQuantize ops when building tensorflow graphs with |
Nice work! We should update tf2onnx to have this flag. Can you submit a PR?
Are you sure about this? It was my understanding from the image you gave that the fusion doesn't really need the Quantize to be followed by a dequantize, rather it needs the op being fused (Conv, MatMul, etc) to be surrounded by Dequantize/Quantize ops on the inputs/outputs, respectively (outputs are optional in some cases): I don't see any boxes around Q/DQ pairs. If the squeeze was moved up, the GlobalAveragePool op would not be eligible for fusing. |
I was under the same impression, but when I ran the TensoRT engine built from ONNXgraph, I could see some dangling There is yet another change that I had to make in ONNX optimizers to achieve optimal fusion during TensorRT engine building. TensoRT recommends (tensorrt forum comment) having a tensorflow-onnx/tf2onnx/optimizer/merge_duplicated_nodes_optimizer.py Lines 112 to 115 in 4245d8d
The graph on left is what I get after making the change. |
If merging two identical nodes causes an issue, I'd really consider that a problem with TensorRT. But we could still exempt them. It almost seems like the best way to deal with Q/DQ would be in a post-processing phase after all the optimizers run. During that phase we could detect all the Q/DQ pairs, propagate quantization info according to rules (like the Relu/Add case), and then write Q/DQ pairs back into the graph. |
Resolved this issue with a PR #1856. Close it. |
Describe the bug
I have taken these representations from TensorRT QAT Presentation
Figure(1)
As shown in Figure(1) above, I added
QuantizeAndDequantizeV2
nodes beforeconv2d
op and forconv2d
kernel in my model.But after converting it with tf2onnx, I don't seem to find the
QuantizeLinear
andDequantizeLinear
nodes for theconv2d
kernel, but as shown in Figure(2) below, I was expecting tf2onnx to keep them and not fold them into constants.Figure(2)
Figure(3)
Tensorboard visualization of my model, showing
QuantizeAndDequantizeV2
ops for both input and weights.Figure(4)
Netron visualization of onnx model, that shows
QuantizeLinear
andDequantizeLinear
nodes only forconv2d
input.It seems logical to fold the
QuantizeLinear
andDequantizeLinear
nodes for weights into a constant as described here, #1394. But looking at Figure(5) below, it looks like TensorRT requires theQuantizeLinear
andDequantizeLinear
nodes for bothconv2d
input and weights!Figure(5)
Urgency
Blocked use-case, TensorFlow2.x QAT -> ONNX -> TensorRT
System information
To Reproduce
Adding
QuantizeAndDequantizeV2
into TF2.x graphs is not trivial and hence it is difficult for me to add all the code here.The text was updated successfully, but these errors were encountered: