[TOPI][Hexagon] Implement quantized avgpool #12340

jverma-quic · 2022-08-08T19:12:54Z

Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

cc @mehrdadh

jverma-quic · 2022-08-08T19:16:19Z

cc: @csullivan; @Lunderberg; @kparzysz-quic, @cconvey

cconvey

Nice work! I may have more comments later, but I thought I had enough feedback for this iteration.

python/tvm/topi/hexagon/qnn/avg_pool2d.py

+        raise RuntimeError("Output width is too large")
+
+
+def saturate(x, dtype):


jverma-quic · 2022-08-10T16:16:42Z

python/tvm/topi/hexagon/qnn/avg_pool2d.py

+        return te.max(0, te.min(x, 255))
+    elif dtype == "int8":
+        return te.max(-127, te.min(x, 128))


Sure, will look into it.

cconvey · 2022-08-09T15:50:43Z

python/tvm/topi/hexagon/utils.py

    raise RuntimeError(f"Unexpected layout '{layout}'")
+
+
+def get_fixed_point_value(flp, dtype="int16"):


Python type annotations (for params and return value) would definitely be helpful here.

E.g., is flp a Python intrinsic, a Numpy numeric value, a TE PrimExpr, or something else?

cconvey · 2022-08-09T16:02:46Z

python/tvm/topi/hexagon/utils.py

    raise RuntimeError(f"Unexpected layout '{layout}'")
+
+
+def get_fixed_point_value(flp, dtype="int16"):


Could we have a few unit tests for this function? It's sufficiently complicated that the code isn't obviously correct just from reading it.

I'll look into adding some test cases. Thanks!

cconvey · 2022-08-09T16:15:29Z

python/tvm/topi/hexagon/utils.py

    raise RuntimeError(f"Unexpected layout '{layout}'")
+
+
+def get_fixed_point_value(flp, dtype="int16"):


I don't notice anything in the function body (or docs) indicating if/how this function handles flp values that are:

denormalized

positive or negative infinity

NaN

Should this function handle those cases gracefully or at least assert if they're encountered?

I think regardless of how they're handled, the docstring should discuss the issue.

Good point! Thanks! The function handles denormalized values, but doesn't handle Nan or infinity. I'll add assert for these two cases. Will also include additional details on the denormalized case.

jverma-quic · 2022-08-09T21:00:26Z

python/tvm/topi/hexagon/qnn/avg_pool2d.py

+    PoolArea = kh * kw
+
+    scale = input_scale / output_scale
+    scale_fixed_point, rsh = get_fixed_point_value(scale, "int16")


rsh (short for right shift) and it's log2(scale_factor). Sorry, I couldn't really think of a better names (may be log2_scale_factor). If you've any suggestions, please do share.

cconvey · 2022-08-09T16:25:55Z

python/tvm/topi/hexagon/qnn/avg_pool2d.py

+
+
+def qnn_avg_pool2d_compute(
+    data,


Would it make sense to document any limitation regarding the applicability / correctness of this function?

I.e., if someone was writing a model and tried to use this TOPI code, how would they discover any limitations (if there are any) on how this could be used?

Yes, there are some assumptions made which are listed at the top of the file.

jverma-quic · 2022-08-10T16:12:04Z

python/tvm/topi/hexagon/qnn/avg_pool2d.py

+def schedule_nhwc_8h8w32c(outs, ins, output_layout: str, input_layout: str):
+    """Schedule for input and output layout nhwc-8h8w32c"""
+    func = te.create_prim_func([ins, outs])
+    s = tir.Schedule(func)
+    Sum = s.get_block("sum")
+    Avg = s.get_block("avg")
+
+    input_transform_fn = get_layout_transform_fn(input_layout)
+    output_transform_fn = get_layout_transform_fn(output_layout)
+    s.transform_layout(Sum, ("read", 0), input_transform_fn)
+    s.transform_layout(Avg, ("write", 0), output_transform_fn)
+
+    # Schedule 'Avg'
+    n, h, w, c = s.get_loops(Avg)
+    ho, hi = s.split(h, [None, 8])
+    wo, wi = s.split(w, [None, 8])
+    wio, wii = s.split(wi, [None, 4])
+    co, ci = s.split(c, [None, 32])
+    s.reorder(n, ho, wo, co, hi, wio, wii, ci)
+    wii_ci = s.fuse(wii, ci)
+    s.vectorize(wii_ci)
+
+    # Schedule 'Sum'
+    s.compute_at(Sum, wio)
+    Sum_axis = s.get_loops(Sum)
+    s.reorder(Sum_axis[-2], Sum_axis[-1], Sum_axis[-4], Sum_axis[-3])
+    ci_wii = s.fuse(Sum_axis[-4], Sum_axis[-3])
+    # s.vectorize(ci_wii) # Doesn't work
+    return s
+
+
+def schedule_n11c_2048c(outs, ins, output_layout: str, input_layout: str):
+    """Schedule for output layout: n11c-2048c, input layout: nhwc-8h8w32c"""
+    func = te.create_prim_func([ins, outs])
+    s = tir.Schedule(func)
+    Sum = s.get_block("sum")
+    Avg = s.get_block("avg")
+
+    input_transform_fn = get_layout_transform_fn(input_layout)
+    output_transform_fn = get_layout_transform_fn(output_layout)
+    s.transform_layout(Sum, ("read", 0), input_transform_fn)
+    s.transform_layout(Avg, ("write", 0), output_transform_fn)
+
+    # Schedule 'Avg'
+    n, h, w, c = s.get_loops(Avg)
+    co, ci = s.split(c, [None, 2048])
+    cio, cii = s.split(ci, [None, 128])
+    s.vectorize(cii)
+
+    # Schedule 'Sum'
+    s.compute_at(Sum, cio)
+    Sum_axis = s.get_loops(Sum)
+    s.reorder(Sum_axis[-2], Sum_axis[-1], Sum_axis[-3])
+    # s.vectorize(Sum_axis[-3]) # Doesn't work
+    return s


Sure! I'll add some comments.

Also, address review comments.

…esting

jverma-quic · 2022-08-15T22:33:38Z

Please let me know if there are any additional comments. If not, can someone approve and merge it for me please?

TejashShah · 2022-08-15T22:46:02Z

cc @cconvey @mehrdadh, please take a relook of the updated files.

cconvey · 2022-08-16T13:42:13Z

@jverma-quic : Could you please click the "Ready for re-review" link next to my name? I think that might be necessary to separate my old vs. new review comments.

cconvey · 2022-08-17T14:47:43Z

tests/python/contrib/test_hexagon/test_fixed_point_conversion.py

+        fp6 = np.random.uniform(2.44885652993e38, 2.54885652993e38, size=(1))
+        fp7 = np.random.uniform(1.46711479073e-34, 1.76098837843e-34, size=(1))


These numbers seem pretty specific. It would be nice to have a comment indicating what (if anything) they correspond to.

The numbers don't really mean anything but I just wanted to test with some very large and small floating-point values to make sure that the conversion function is handling them properly, i.e., doesn't introduce large error.

python/tvm/topi/hexagon/utils.py

cconvey · 2022-08-17T14:55:54Z

tests/python/contrib/test_hexagon/test_fixed_point_conversion.py

+            fxp, rsh = utils.get_fixed_point_value(flp, "int16")
+            # Compute scale_factor using rsh (rsh is log2 of the scale_factor). While doing this,
+            # we use IEEE-754 floating-point representation since rsh can be negative or positive.
+
+            scale = ((rsh + 127) & 0xFF) << 23  # Add bias (127) and position it into exponent bits
+            scale_i = struct.pack("I", scale)  # Pack it as integer
+            scale_f = struct.unpack("f", scale_i)  # Unpack as float
+
+            converted_flp = fxp / scale_f[0]


Would it make sense to move this logic into new function in utils, e.g. get_floating_point_value(fxp:int, rsh:int, dtype="float16") -> float ?

I'm just thinking that the two conversion functions probably belong in the same place, even if one is currently used only for testing.

That's what I had earlier but I decided not to do it mainly because it's need just for testing and doesn't provide any additional value. I would prefer to keep it that way unless this is a major concern.

Thanks, that makes sense. I think it's just a matter of personal preference, so no object to keeping it as is.

cconvey · 2022-08-17T15:04:01Z

tests/python/contrib/test_hexagon/test_fixed_point_conversion.py

+        fp1 = np.random.uniform(0.00001, 0.0002, size=(10))
+        fp2 = np.random.uniform(0.001, 0.02, size=(10))
+        fp3 = np.random.uniform(1, 20, size=(10))
+        fp4 = np.random.uniform(900, 1000, size=(10))
+        fp5 = np.random.uniform(1e9, 1e10, size=(10))
+        fp6 = np.random.uniform(2.44885652993e38, 2.54885652993e38, size=(1))
+        fp7 = np.random.uniform(1.46711479073e-34, 1.76098837843e-34, size=(1))


I'm wondering if random draws are worth the effort / complexity here...

If the goal is just simple, sanity-checking unit tests, then I'm not sure we really need randomness. Especially if they lead to test-failures that can't be reproduced for the sake of debugging, due to the randomization.

If the goal is to check corner cases, I would think that's better done using specifically chosen values, e.g.

extreme value / special values for floating-point numbers

floating point values that, by inspection of the conversion algorithm, are likely to be critical

I didn't really think about it since this is just a small unit test and we're just constructing at most 10 element long arrays. If you're really concerned about the complexity aspect of it, then I don't mind doing what you're suggesting but otherwise, I would prefer leaving it as is.

No objection to leaving the code as it is. Would you consider just adding a comment about the semi-arbitrary nature of those numbers? Usually when I see something as precise as 2.44885652993e38, I assume the number is chosen for a particular reason. It might save other readers some time to know that there's no deeper meaning here.

I agree. I'll add some comments to make it explicit.

cconvey · 2022-08-17T15:35:03Z

python/tvm/topi/hexagon/utils.py

+
+    # Make sure that 'flp' isn't NaN or infinity
+    if math.isnan(flp) or math.isinf(flp):
+        raise RuntimeError("Can not handle NaN or INF")


Nitpick: Sometimes comments like this indicate a temporary limitation of the function that could be addressed in a later version. But IIUC, the fixed-point format we're dealing with here is simply incapable of expressing those two concepts.

It might be helpful to use an error message that's clearer about this.

cconvey · 2022-08-17T15:41:37Z

python/tvm/topi/hexagon/utils.py

+    if exp_stored_value == 0:
+        raise RuntimeError("Can not handle denormalized values")


(This is somewhat redundant to a comment I left regarding the function's docstring, above.)

It would be nice to have a comment regarding why denormalized values aren't handled. E.g.:

they're always indistinguishable from 0 in the resulting fixed-point representation, or

we don't need to support them yet, so we're just not dealing with them for now, or

(something else)

Sure, I'll elaborate on this. Thanks!

cconvey · 2022-08-17T15:49:10Z

python/tvm/topi/hexagon/utils.py

+
+    Additonal notes on various floating-point values:
+    ------------------------------------------------
+    1) Denormalized values: Can't be represented as fixed-point - causes assertion failure


I'm confused by the claim that denormal values can't be expressed as fixed-point.

My understanding is that IEEE-754 denormalized format is simply a special way of encoding numbers that are much closer to 0 than normalized float16 values can express. I don't understand why that's fundamentally inexpressable as fixed-point.

Are we assuming some additional unstated limitations on our fixedpoint representation? E.g., the range of values that we're willing to let rsh take on?

To convert denormalized values into fixed point values, we'll require a very large scale factor which can't be represented using the available bits.

cconvey · 2022-08-17T22:00:54Z

python/tvm/topi/hexagon/qnn/avg_pool2d.py

+
+def saturate(x: te.Tensor, dtype: str):
+    """Saturate value for the specified data type"""
+    return te.max(te.min_value(dtype), te.min(x, te.max_value(dtype)))


When I looked at several of the Hexagon .so files produced by this PR's unit tests, I didn't see any indication that Hexagon's saturate or :sat instructions were being used.

This isn't a critique of the PR; I'm just mentioning it as a point of interest for future work.

Actually, I'm wondering if the saturate can sometimes be elided entirely when dtype=float16.

Here's my (maybe flawed) reasoning:

Hexagon has two units that might do this math: the ARM core, and the HVX units.

If this code runs on the ARM core, then it's treated as an IEEE-754 single-precision float.

If this code runs on an HVX core, then it's going to be processed using qfloat16 semantics, which automatically uses saturate behavior.

So any dataflow path that definitely involves qfloat16 representation could (perhaps) entirely avoid explicit saturation logic.

I'm starting to wonder if TIR should eventually distinguish saturated vs. unsaturated ops.

Thanks for the comment, @cconvey! You're correct about saturate not being needed for float16 dtype. Please note that the functions in this file qnn/avg_pool2d.py are meant to be used only for the quantized models and therefore should have uint8 and int8 dtypes.

When I looked at several of the Hexagon .so files produced by this PR's unit tests, I didn't see any indication that Hexagon's saturate or :sat instructions were being used.

This isn't a critique of the PR; I'm just mentioning it as a point of interest for future work.

That's very likely. Thanks for looking into it!

Unless we generate saturating llvm instructions through TVM, we will have to add additional code in LLVM to recognize the sequence of min, max as saturate.

Unless we generate saturating llvm instructions through TVM, we will have to add additional code in LLVM to recognize the sequence of min, max as saturate.

I wonder if there's a good way to coordinate on where these replacement-patterns get implemented.

I imagine it makes sense to eventually put an optimization like this into one of TVM or LLVM, but it's potentially a waste of effort to put it into both.

I agree. I think it will be better to do it in TVM as we can generate the appropriate LLVM saturating instruction during TVM codegen which can then be lowered into target specific instructions in the LLVM backend.

cconvey · 2022-08-18T13:38:30Z

This PR introduces several TOPI-related functions (qnn_avg_pool2d_compute and qnn_avg_pool2d_schedule). Does this PR make these functions available for compile-time consideration by TOPI?

I'm not very familiar with the mechanisms TVM uses for this, so apologies if I'm just missing how it happens.

Co-authored-by: Christian Convey <christian.convey@gmail.com>

jverma-quic · 2022-08-19T15:26:39Z

This PR introduces several TOPI-related functions (qnn_avg_pool2d_compute and qnn_avg_pool2d_schedule). Does this PR make these functions available for compile-time consideration by TOPI?

I'm not very familiar with the mechanisms TVM uses for this, so apologies if I'm just missing how it happens.

That's correct. The PR does introduce several TOPI related functions. However, since they require inputs and outputs to be in 2d discontiguous buffer, they aren't yet available for use by OpStrategy. One of the issues here is that Relay is unable to handle complex layout needed for these discontiguous buffers and requires some additional work.

cconvey

Thanks @jverma-quic , the PR is looking really good. I left one small suggestion, feel free to ignore it.

Once you're satisifed, I'm happy with getting this merged. You may want to ping @mehrdadh for that official review; I think he was waiting for my review to finish.

cconvey · 2022-08-19T19:22:48Z

tests/python/contrib/test_hexagon/test_fixed_point_conversion.py

+        fp1 = np.random.uniform(0.00001, 0.0002, size=(10))
+        fp2 = np.random.uniform(0.001, 0.02, size=(10))
+        fp3 = np.random.uniform(1, 20, size=(10))
+        fp4 = np.random.uniform(900, 1000, size=(10))
+        fp5 = np.random.uniform(1e9, 1e10, size=(10))
+        fp6 = np.random.uniform(2.44885652993e38, 2.54885652993e38, size=(1))
+        fp7 = np.random.uniform(1.46711479073e-34, 1.76098837843e-34, size=(1))


No objection to leaving the code as it is. Would you consider just adding a comment about the semi-arbitrary nature of those numbers? Usually when I see something as precise as 2.44885652993e38, I assume the number is chosen for a particular reason. It might save other readers some time to know that there's no deeper meaning here.

jverma-quic · 2022-08-19T19:43:39Z

Thanks @jverma-quic , the PR is looking really good. I left one small suggestion, feel free to ignore it.

Once you're satisifed, I'm happy with getting this merged. You may want to ping @mehrdadh for that official review; I think he was waiting for my review to finish.

Thanks @cconvey! I really appreciate you taking time to review this PR and your detailed comments.

jverma-quic · 2022-08-24T15:25:36Z

@cconvey, @mehrdadh, @kparzysz-quic, @TejashShah : I'm waiting to merge this PR. Unless there are additional comments, can someone approve and merge it for me please? Thanks!

mehrdadh · 2022-08-24T18:53:04Z

@jverma-quic sorry for the delay, looking at the PR now

mehrdadh · 2022-08-24T18:56:42Z

@jverma-quic PR is merged! Thanks for your contribution!
Moving forward please use a meaningful PR description.

@cconvey thanks for the review!

* [TOPI][Hexagon] Implement quantized avgpool * Fix pylint errors * Needed to adjust input padding for int8 buffer layout * Fix formatting issue * Add unit test for fixed-point conversion utility function Also, address review comments. * Remove pytest.skip for test_avg_pool2d_slice.py to enable on-target testing * Fix formatting issue * Update python/tvm/topi/hexagon/utils.py Co-authored-by: Christian Convey <christian.convey@gmail.com> * Update comments and error messages * Address review comments * Import Tuple from typing * Address pylint error Co-authored-by: Christian Convey <christian.convey@gmail.com>

[TOPI][Hexagon] Implement quantized avgpool

dd4bec9

Fix pylint errors

66fac3d

github-actions bot requested a review from mehrdadh August 8, 2022 20:14

jverma-quic added 2 commits August 8, 2022 16:41

Needed to adjust input padding for int8 buffer layout

60b04ec

Fix formatting issue

72fc6cd

cconvey suggested changes Aug 9, 2022

View reviewed changes

jverma-quic and others added 6 commits August 10, 2022 11:18

Merge branch 'main' into quant-avgpool

5f27009

Merge branch 'apache:main' into quant-avgpool

4b9b661

Add unit test for fixed-point conversion utility function

9a2afff

Also, address review comments.

Remove pytest.skip for test_avg_pool2d_slice.py to enable on-target t…

286cd60

…esting

Fix formatting issue

e16c7b7

Merge branch 'apache:main' into quant-avgpool

61a917d

jverma-quic requested a review from cconvey August 16, 2022 13:45

cconvey reviewed Aug 17, 2022

View reviewed changes

Update python/tvm/topi/hexagon/utils.py

f499d38

Co-authored-by: Christian Convey <christian.convey@gmail.com>

jverma-quic and others added 2 commits August 19, 2022 13:17

Merge branch 'apache:main' into quant-avgpool

0b1bb41

Update comments and error messages

0cc36a0

jverma-quic requested review from cconvey and removed request for mehrdadh August 19, 2022 18:28

cconvey reviewed Aug 19, 2022

View reviewed changes

Address review comments

8c22d2b

jverma-quic requested a review from cconvey August 19, 2022 20:00

github-actions bot removed the request for review from cconvey August 19, 2022 20:00

github-actions bot requested a review from mehrdadh August 19, 2022 20:00

jverma-quic and others added 3 commits August 22, 2022 14:39

Merge branch 'apache:main' into quant-avgpool

c536a66

Import Tuple from typing

f2337d7

Address pylint error

6b9b22f

mehrdadh approved these changes Aug 24, 2022

View reviewed changes

mehrdadh merged commit 1afd059 into apache:main Aug 24, 2022

cconvey mentioned this pull request Aug 29, 2022

CI monitoring rotation schedule #11462

Closed

AndrewZhaoLuo mentioned this pull request Oct 4, 2022

TVM v0.10.0.rc0 Release Candidate Notes #12979

Closed

		raise RuntimeError("Output width is too large")


		def saturate(x, dtype):

		raise RuntimeError(f"Unexpected layout '{layout}'")


		def get_fixed_point_value(flp, dtype="int16"):

		fp6 = np.random.uniform(2.44885652993e38, 2.54885652993e38, size=(1))
		fp7 = np.random.uniform(1.46711479073e-34, 1.76098837843e-34, size=(1))

		if exp_stored_value == 0:
		raise RuntimeError("Can not handle denormalized values")

[TOPI][Hexagon] Implement quantized avgpool #12340

[TOPI][Hexagon] Implement quantized avgpool #12340

Conversation

jverma-quic commented Aug 8, 2022 • edited by github-actions bot Loading

jverma-quic commented Aug 8, 2022 • edited Loading

cconvey left a comment

Choose a reason for hiding this comment

This comment was marked as resolved.

This comment was marked as resolved.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as resolved.

Choose a reason for hiding this comment

This comment was marked as resolved.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as resolved.

Choose a reason for hiding this comment

jverma-quic commented Aug 15, 2022

TejashShah commented Aug 15, 2022

cconvey commented Aug 16, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cconvey Aug 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cconvey Aug 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cconvey Aug 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cconvey Aug 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cconvey commented Aug 18, 2022

jverma-quic commented Aug 19, 2022

cconvey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jverma-quic commented Aug 19, 2022

jverma-quic commented Aug 24, 2022

mehrdadh commented Aug 24, 2022

mehrdadh commented Aug 24, 2022

jverma-quic commented Aug 8, 2022 •

edited by github-actions bot

Loading

jverma-quic commented Aug 8, 2022 •

edited

Loading

cconvey Aug 17, 2022 •

edited

Loading

cconvey Aug 17, 2022 •

edited

Loading

cconvey Aug 18, 2022 •

edited

Loading

cconvey Aug 19, 2022 •

edited

Loading