Add a conversion of individual operations in FQ2I pass. #10239

Icemist · 2022-02-14T12:28:12Z

This conversion happens after the basic part of FQ2I pass. The main idea of the conversion is to find and transform operations with dequantized inputs one by one individually. Only operations from the allowed list are allowed.

For example, if on the above general pattern op2 is not registered with the FTVMFakeQuantizationToInteger attribute, op1 operation can still be converted. Converted pattern below:

   x    w       x   w
   |    |       |   |
   dq   dq      \   /
    \   /        op1
     op1          |
      |     =>   dq
     op2          |
      |          op2
      |           |
      q           q

Measured accuracy on a BERT model bert_large_v1_1_fake_quant.onnx:

Model type	Accuracy
without old fq2i	{"exact_match": 82.41248817407758, "f1": 90.06966006174216}
with old fq2i	{"exact_match": 82.41248817407758, "f1": 90.06966006174216}
with new fq2i	{"exact_match": 82.73415326395458, "f1": 90.17821060674615}

masahi · 2022-02-14T19:45:27Z

Can you show some accuracy results on QAT models (BERT etc)?

masahi · 2022-02-14T21:19:33Z

Please also show the reference result (accuracy without fq2i).

Icemist · 2022-02-15T15:53:00Z

Please also show the reference result (accuracy without fq2i).

Model type	Accuracy
without old fq2i	{"exact_match": 82.41248817407758, "f1": 90.06966006174216}
with old fq2i	{"exact_match": 82.41248817407758, "f1": 90.06966006174216}
with new fq2i	{"exact_match": 82.73415326395458, "f1": 90.17821060674615}

mbrookhart

Overall very nice, only comments are around improved documentation and an optional flag.

@masahi and @elvin-n and I talked a couple of months ago and agreed that the QAT version of the pass (which you're adding here) should probably be a separate pass from the explicit fake quantized model ala tflite and TensorRT. I go back and forth on what the right final form is (I've found edge cases for both versions), but I'd love some improved comments on why this particular design

mbrookhart · 2022-02-15T16:04:07Z

src/relay/transforms/fake_quantization_to_integer.cc

@@ -270,8 +293,233 @@ class FakeQuantizationRewriter : public MixedModeMutator {
  const bool hard_fail_;
 };

+bool is_op_enabled_for_optional_fq2i(const CallNode* call_node) {


Could we add some comments and advice about what ops should be included in this list?

I added a comment of how I selected these operations.

mbrookhart · 2022-02-15T16:05:45Z

src/relay/transforms/fake_quantization_to_integer.cc

 Expr FakeQuantizationToInteger(const Expr& expr, const IRModule& mod, bool hard_fail) {
-  return FakeQuantizationRewriter(hard_fail).Mutate(expr);
+  auto fq_expr = FakeQuantizationRewriter(hard_fail).Mutate(expr);
+  auto fq_inferred_expr = tvm::relay::InferType(fq_expr);
+  auto ofq_expr = OptionalFakeQuantizationRewriter(hard_fail).Mutate(fq_inferred_expr);
+  return ofq_expr;
 }


I'm not sure how problematic this will be in non-QAT models, but would it make sense to add another bool to make the "Optional" part of the pass actually optional?

let's add parameter enableQAT to the FakeQuantizationToInteger pass with default value not to call QAT transformation

I added the parameter, but called it use_qat. I'm not sure if CamelCase fits here, and I've seen qat in other libraries shortened to lower case.

AndrewZhaoLuo

Need to read pass a little more carefully to understand difference between this and existing fq2i passes

src/relay/transforms/fake_quantization_to_integer.cc

include/tvm/relay/qnn/op/dequantize.h

src/relay/transforms/type_infer.cc

tests/python/relay/test_pass_fake_quantization_to_integer.py

masahi · 2022-02-15T21:10:33Z

Need to read pass a little more carefully to understand difference between this and existing fq2i passes

The key is that in the models that existing pass expects, there is always matching dq and q. This PR handles cases such assumption doesnt hold.

AndrewZhaoLuo · 2022-02-16T05:34:49Z

That is understood, my main concern is a lot of code looks similar and I'm wondering if this "optional" pass can supersede the original pass.

elvin-n · 2022-02-16T06:18:50Z

That is understood, my main concern is a lot of code looks similar

Ideologically transformations look similar, but it is unable to extend current transformation exactly due to its expectation of DQ->subgraph->Q pattern. And it is not a lot of duplication - all op conversion defined through FTVMFakeQuantizationToInteger mechanism is reused, that is great

I'm wondering if this "optional" pass can supersede the original pass.

Interesting question. Probably it can work in that way as well, but should not be so important.

elvin-n · 2022-02-16T13:21:21Z

@mbrookhart

agreed that the QAT version of the pass (which you're adding here) should probably be a separate pass from the explicit fake quantized model ala tflite and TensorRT

As I see we did exactly what we agreed - the transformation itself is independent from the current one in opposite to the previous my PR (FakeQuantizationRewriter and OptionalFakeQuantizationRewriter). Then we agreed, if I am not mistaken, to have the only pass to simplify user's life. User will be aware about only one function - FakeQuantizeToInteger that he have to call in his python code. On the other hand inside the pass, there will be two transformations - current one and new one. If we want to have QAT transformation optional, ok, let's add parameter to the FakeQuantizeToInteger with default value not to call QAT

Icemist · 2022-02-17T01:11:23Z

That is understood, my main concern is a lot of code looks similar and I'm wondering if this "optional" pass can supersede the original pass.

I guess this is a slightly different approach. Not all operations are suitable for the new pass.
Some operations like "add" require an explicit out_scale, which in the existing pass is taken from the quantization operation.
This is possible for the existing pass because it goes from the end to the beginning of the selected subgraph.
In the new pass we go from the beginning to the end. Starting from the dequantization operations and have no access to the quantization operation info at the moments of converting. Or maybe even stop halfway before we get the info.

AndrewZhaoLuo

LGTM, one more small q

src/relay/transforms/fake_quantization_to_integer.cc

* Add a conversion of individual operations in FQ2I pass. * apply review comments * apply review comments 2

Icemist requested review from jroesch, slyubomirsky, icemelon, MarisaKirisame, ZihengJiang, yzhliu, vinx13, mbrookhart, jwfromm, zhiics, anijain2305, wweic, junrushao, comaniac, tqchen, areusch and merrymercy as code owners February 14, 2022 12:28

Icemist force-pushed the avoronov/update_fq2i_individual_ops branch 2 times, most recently from c8a96bf to c595a4d Compare February 14, 2022 13:42

AndrewZhaoLuo self-assigned this Feb 14, 2022

mbrookhart requested changes Feb 15, 2022

View reviewed changes

AndrewZhaoLuo reviewed Feb 15, 2022

View reviewed changes

Add a conversion of individual operations in FQ2I pass.

dd05785

Icemist force-pushed the avoronov/update_fq2i_individual_ops branch 3 times, most recently from a92696c to 469888c Compare February 17, 2022 00:45

apply review comments

aa378d2

Icemist force-pushed the avoronov/update_fq2i_individual_ops branch from 469888c to aa378d2 Compare February 17, 2022 00:58

AndrewZhaoLuo approved these changes Feb 17, 2022

View reviewed changes

src/relay/transforms/fake_quantization_to_integer.cc Outdated Show resolved Hide resolved

apply review comments 2

d77b7af

mbrookhart approved these changes Feb 17, 2022

View reviewed changes

AndrewZhaoLuo merged commit 7f24954 into apache:main Feb 17, 2022

masahi mentioned this pull request Feb 21, 2022

[TOPI] VNNI support for batch matmul #10332

Merged

masahi mentioned this pull request Mar 14, 2022

Add benchmark models that are not easily accessible tlc-pack/TLCBench#5

Merged

pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022

Add a conversion of individual operations in FQ2I pass. (apache#10239)

8cb9bc6

* Add a conversion of individual operations in FQ2I pass. * apply review comments * apply review comments 2

driazati mentioned this pull request Jul 14, 2022

TVM v0.9.0.rc0 Release Candidate Notes #12102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a conversion of individual operations in FQ2I pass. #10239

Add a conversion of individual operations in FQ2I pass. #10239

Icemist commented Feb 14, 2022 •

edited

Loading

masahi commented Feb 14, 2022

masahi commented Feb 14, 2022

Icemist commented Feb 15, 2022

mbrookhart left a comment

mbrookhart Feb 15, 2022

Icemist Feb 17, 2022

mbrookhart Feb 15, 2022

elvin-n Feb 16, 2022

Icemist Feb 17, 2022

AndrewZhaoLuo left a comment

masahi commented Feb 15, 2022

AndrewZhaoLuo commented Feb 16, 2022

elvin-n commented Feb 16, 2022 •

edited

Loading

elvin-n commented Feb 16, 2022

Icemist commented Feb 17, 2022

AndrewZhaoLuo left a comment

Add a conversion of individual operations in FQ2I pass. #10239

Add a conversion of individual operations in FQ2I pass. #10239

Conversation

Icemist commented Feb 14, 2022 • edited Loading

masahi commented Feb 14, 2022

masahi commented Feb 14, 2022

Icemist commented Feb 15, 2022

mbrookhart left a comment

Choose a reason for hiding this comment

mbrookhart Feb 15, 2022

Choose a reason for hiding this comment

Icemist Feb 17, 2022

Choose a reason for hiding this comment

mbrookhart Feb 15, 2022

Choose a reason for hiding this comment

elvin-n Feb 16, 2022

Choose a reason for hiding this comment

Icemist Feb 17, 2022

Choose a reason for hiding this comment

AndrewZhaoLuo left a comment

Choose a reason for hiding this comment

masahi commented Feb 15, 2022

AndrewZhaoLuo commented Feb 16, 2022

elvin-n commented Feb 16, 2022 • edited Loading

elvin-n commented Feb 16, 2022

Icemist commented Feb 17, 2022

AndrewZhaoLuo left a comment

Choose a reason for hiding this comment

Icemist commented Feb 14, 2022 •

edited

Loading

elvin-n commented Feb 16, 2022 •

edited

Loading