-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a conversion of individual operations in FQ2I pass. #10239
Add a conversion of individual operations in FQ2I pass. #10239
Conversation
c8a96bf
to
c595a4d
Compare
Can you show some accuracy results on QAT models (BERT etc)? |
Please also show the reference result (accuracy without fq2i). |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall very nice, only comments are around improved documentation and an optional flag.
@masahi and @elvin-n and I talked a couple of months ago and agreed that the QAT version of the pass (which you're adding here) should probably be a separate pass from the explicit fake quantized model ala tflite and TensorRT. I go back and forth on what the right final form is (I've found edge cases for both versions), but I'd love some improved comments on why this particular design
@@ -270,8 +293,233 @@ class FakeQuantizationRewriter : public MixedModeMutator { | |||
const bool hard_fail_; | |||
}; | |||
|
|||
bool is_op_enabled_for_optional_fq2i(const CallNode* call_node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add some comments and advice about what ops should be included in this list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment of how I selected these operations.
Expr FakeQuantizationToInteger(const Expr& expr, const IRModule& mod, bool hard_fail) { | ||
return FakeQuantizationRewriter(hard_fail).Mutate(expr); | ||
auto fq_expr = FakeQuantizationRewriter(hard_fail).Mutate(expr); | ||
auto fq_inferred_expr = tvm::relay::InferType(fq_expr); | ||
auto ofq_expr = OptionalFakeQuantizationRewriter(hard_fail).Mutate(fq_inferred_expr); | ||
return ofq_expr; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how problematic this will be in non-QAT models, but would it make sense to add another bool to make the "Optional" part of the pass actually optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add parameter enableQAT
to the FakeQuantizationToInteger pass with default value not to call QAT transformation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the parameter, but called it use_qat. I'm not sure if CamelCase fits here, and I've seen qat in other libraries shortened to lower case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to read pass a little more carefully to understand difference between this and existing fq2i passes
The key is that in the models that existing pass expects, there is always matching dq and q. This PR handles cases such assumption doesnt hold. |
That is understood, my main concern is a lot of code looks similar and I'm wondering if this "optional" pass can supersede the original pass. |
Ideologically transformations look similar, but it is unable to extend current transformation exactly due to its expectation of DQ->subgraph->Q pattern. And it is not a lot of duplication - all op conversion defined through
Interesting question. Probably it can work in that way as well, but should not be so important. |
As I see we did exactly what we agreed - the transformation itself is independent from the current one in opposite to the previous my PR (FakeQuantizationRewriter and OptionalFakeQuantizationRewriter). Then we agreed, if I am not mistaken, to have the only pass to simplify user's life. User will be aware about only one function - FakeQuantizeToInteger that he have to call in his python code. On the other hand inside the pass, there will be two transformations - current one and new one. If we want to have QAT transformation optional, ok, let's add parameter to the FakeQuantizeToInteger with default value not to call QAT |
a92696c
to
469888c
Compare
469888c
to
aa378d2
Compare
I guess this is a slightly different approach. Not all operations are suitable for the new pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one more small q
* Add a conversion of individual operations in FQ2I pass. * apply review comments * apply review comments 2
This conversion happens after the basic part of FQ2I pass. The main idea of the conversion is to find and transform operations with dequantized inputs one by one individually. Only operations from the allowed list are allowed.
For example, if on the above general pattern op2 is not registered with the FTVMFakeQuantizationToInteger attribute, op1 operation can still be converted. Converted pattern below:
Measured accuracy on a BERT model bert_large_v1_1_fake_quant.onnx: