Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a conversion of individual operations in FQ2I pass. #10239

Merged

Conversation

Icemist
Copy link
Contributor

@Icemist Icemist commented Feb 14, 2022

This conversion happens after the basic part of FQ2I pass. The main idea of the conversion is to find and transform operations with dequantized inputs one by one individually. Only operations from the allowed list are allowed.

For example, if on the above general pattern op2 is not registered with the FTVMFakeQuantizationToInteger attribute, op1 operation can still be converted. Converted pattern below:

   x    w       x   w
   |    |       |   |
   dq   dq      \   /
    \   /        op1
     op1          |
      |     =>   dq
     op2          |
      |          op2
      |           |
      q           q

Measured accuracy on a BERT model bert_large_v1_1_fake_quant.onnx:

Model type Accuracy
without old fq2i {"exact_match": 82.41248817407758, "f1": 90.06966006174216}
with old fq2i {"exact_match": 82.41248817407758, "f1": 90.06966006174216}
with new fq2i {"exact_match": 82.73415326395458, "f1": 90.17821060674615}

@masahi
Copy link
Member

masahi commented Feb 14, 2022

Can you show some accuracy results on QAT models (BERT etc)?

@masahi
Copy link
Member

masahi commented Feb 14, 2022

Please also show the reference result (accuracy without fq2i).

@Icemist
Copy link
Contributor Author

Icemist commented Feb 15, 2022

Please also show the reference result (accuracy without fq2i).

Model type Accuracy
without old fq2i {"exact_match": 82.41248817407758, "f1": 90.06966006174216}
with old fq2i {"exact_match": 82.41248817407758, "f1": 90.06966006174216}
with new fq2i {"exact_match": 82.73415326395458, "f1": 90.17821060674615}

Copy link
Contributor

@mbrookhart mbrookhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall very nice, only comments are around improved documentation and an optional flag.

@masahi and @elvin-n and I talked a couple of months ago and agreed that the QAT version of the pass (which you're adding here) should probably be a separate pass from the explicit fake quantized model ala tflite and TensorRT. I go back and forth on what the right final form is (I've found edge cases for both versions), but I'd love some improved comments on why this particular design

@@ -270,8 +293,233 @@ class FakeQuantizationRewriter : public MixedModeMutator {
const bool hard_fail_;
};

bool is_op_enabled_for_optional_fq2i(const CallNode* call_node) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add some comments and advice about what ops should be included in this list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment of how I selected these operations.

Comment on lines 518 to 538
Expr FakeQuantizationToInteger(const Expr& expr, const IRModule& mod, bool hard_fail) {
return FakeQuantizationRewriter(hard_fail).Mutate(expr);
auto fq_expr = FakeQuantizationRewriter(hard_fail).Mutate(expr);
auto fq_inferred_expr = tvm::relay::InferType(fq_expr);
auto ofq_expr = OptionalFakeQuantizationRewriter(hard_fail).Mutate(fq_inferred_expr);
return ofq_expr;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how problematic this will be in non-QAT models, but would it make sense to add another bool to make the "Optional" part of the pass actually optional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add parameter enableQAT to the FakeQuantizationToInteger pass with default value not to call QAT transformation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the parameter, but called it use_qat. I'm not sure if CamelCase fits here, and I've seen qat in other libraries shortened to lower case.

Copy link
Contributor

@AndrewZhaoLuo AndrewZhaoLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to read pass a little more carefully to understand difference between this and existing fq2i passes

src/relay/transforms/fake_quantization_to_integer.cc Outdated Show resolved Hide resolved
include/tvm/relay/qnn/op/dequantize.h Outdated Show resolved Hide resolved
src/relay/transforms/type_infer.cc Show resolved Hide resolved
@masahi
Copy link
Member

masahi commented Feb 15, 2022

Need to read pass a little more carefully to understand difference between this and existing fq2i passes

The key is that in the models that existing pass expects, there is always matching dq and q. This PR handles cases such assumption doesnt hold.

@AndrewZhaoLuo
Copy link
Contributor

That is understood, my main concern is a lot of code looks similar and I'm wondering if this "optional" pass can supersede the original pass.

@elvin-n
Copy link
Contributor

elvin-n commented Feb 16, 2022

That is understood, my main concern is a lot of code looks similar

Ideologically transformations look similar, but it is unable to extend current transformation exactly due to its expectation of DQ->subgraph->Q pattern. And it is not a lot of duplication - all op conversion defined through FTVMFakeQuantizationToInteger mechanism is reused, that is great

I'm wondering if this "optional" pass can supersede the original pass.

Interesting question. Probably it can work in that way as well, but should not be so important.

@elvin-n
Copy link
Contributor

elvin-n commented Feb 16, 2022

@mbrookhart

agreed that the QAT version of the pass (which you're adding here) should probably be a separate pass from the explicit fake quantized model ala tflite and TensorRT

As I see we did exactly what we agreed - the transformation itself is independent from the current one in opposite to the previous my PR (FakeQuantizationRewriter and OptionalFakeQuantizationRewriter). Then we agreed, if I am not mistaken, to have the only pass to simplify user's life. User will be aware about only one function - FakeQuantizeToInteger that he have to call in his python code. On the other hand inside the pass, there will be two transformations - current one and new one. If we want to have QAT transformation optional, ok, let's add parameter to the FakeQuantizeToInteger with default value not to call QAT

@Icemist Icemist force-pushed the avoronov/update_fq2i_individual_ops branch 3 times, most recently from a92696c to 469888c Compare February 17, 2022 00:45
@Icemist Icemist force-pushed the avoronov/update_fq2i_individual_ops branch from 469888c to aa378d2 Compare February 17, 2022 00:58
@Icemist
Copy link
Contributor Author

Icemist commented Feb 17, 2022

That is understood, my main concern is a lot of code looks similar and I'm wondering if this "optional" pass can supersede the original pass.

I guess this is a slightly different approach. Not all operations are suitable for the new pass.
Some operations like "add" require an explicit out_scale, which in the existing pass is taken from the quantization operation.
This is possible for the existing pass because it goes from the end to the beginning of the selected subgraph.
In the new pass we go from the beginning to the end. Starting from the dequantization operations and have no access to the quantization operation info at the moments of converting. Or maybe even stop halfway before we get the info.

Copy link
Contributor

@AndrewZhaoLuo AndrewZhaoLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one more small q

src/relay/transforms/fake_quantization_to_integer.cc Outdated Show resolved Hide resolved
@AndrewZhaoLuo AndrewZhaoLuo merged commit 7f24954 into apache:main Feb 17, 2022
pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022
* Add a conversion of individual operations in FQ2I pass.

* apply review comments

* apply review comments 2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants