Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed point multiplication improvements for AArch64 #5980

Merged
merged 11 commits into from
Jul 17, 2020

Conversation

giuseros
Copy link
Contributor

@giuseros giuseros commented Jul 2, 2020

RFC

This PR is based on the following RFC: https://discuss.tvm.ai/t/rfc-using-arm-intrinsics-to-implement-fixed-point-multiplication-in-tvm

High level description of the submission

The idea is to create a TIR intrinsic fixed_point_multiply that can be overloaded (by the means of tvm.target.intrin.register_intrin_rule) and use AArch64 intrinsics (other vendors can provide their own implementation)

The TIR default intrinsic is registered in: tvm/src/target/intrin_rule.cc
The overload for AArch64 lives in tvm/topi/python/topi/arm_cpu/tensor_intrin.py

@giuseros
Copy link
Contributor Author

giuseros commented Jul 2, 2020

@giuseros
Copy link
Contributor Author

giuseros commented Jul 8, 2020

Following the discussion in the RFC I redesigned the submission by introducing a general qmuls intrinsic:

  • Relay uses qmuls to implement fixed_point_multiply
  • The new intrinsic is more general, as it accepts the Q-ness of the input values to be multiplied

The structure of the submission remains the same.

@giuseros giuseros force-pushed the fixed_point_multiply branch 3 times, most recently from bb787cd to 3b238a6 Compare July 9, 2020 16:27
@giuseros
Copy link
Contributor Author

@anijain2305 @kparzysz-quic @tqchen Any update on this?

include/tvm/tir/op.h Outdated Show resolved Hide resolved
python/tvm/tir/op.py Outdated Show resolved Hide resolved
src/target/intrin_rule.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@tmoreau89 tmoreau89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution @giuseros, overall the changes look good. It would be great to add one or two unit tests on the tensor intrinsic matching to ensure this fixed-point multiplication support doesn't break/bitrot.

Copy link
Contributor

@anijain2305 anijain2305 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this work. Overall looks ok. There are few follow-ups.

We also need a test for the new Relay op as @tmoreau89 suggested

include/tvm/relay/attrs/transform.h Outdated Show resolved Hide resolved
include/tvm/tir/op.h Outdated Show resolved Hide resolved
include/tvm/tir/builtin.h Outdated Show resolved Hide resolved
include/tvm/tir/op.h Outdated Show resolved Hide resolved
src/relay/op/tensor/unary.cc Show resolved Hide resolved
src/relay/qnn/op/requantize.cc Outdated Show resolved Hide resolved
python/tvm/tir/op.py Show resolved Hide resolved
topi/python/topi/arm_cpu/injective.py Outdated Show resolved Hide resolved
topi/python/topi/arm_cpu/tensor_intrin.py Outdated Show resolved Hide resolved
topi/python/topi/math.py Outdated Show resolved Hide resolved
@giuseros
Copy link
Contributor Author

@anijain2305 , @tmoreau89 , @kparzysz-quic I tried to address/reply to your comments. I added a test_fixed_point_multiply() in test_op_level3.py to test the intrinsic (which receives a lot of additional testing since it will be used in every qnn::requantize operation).

@giuseros
Copy link
Contributor Author

Quick update on this.

  • I had to remove checks for input data type to be int32 from the intrinsic (and TOPI). Indeed, this can be called from relay/quantize/realize.cc with int64 and int8 input datatypes. After all, we convert upfront to int64 so the input data type shouldn't matter.
  • I also applied some changes to the intrinsic so that now the casts are more explicit.

Giuseppe Rossini added 10 commits July 15, 2020 16:35
Change-Id: Ib3c10348d4c0eac11fa92b39cc6e792560e9eba4
Change-Id: I4cf5ac18aa24b39374b83805dcc8e1663e173909
Change-Id: Ie3c861f8ead3f1ea5b30d5e9d7d94e222299d407
Change-Id: I6ad9da61b61e6bd737627f26fba59767418c07cd
Change-Id: Ic864a235aa5da5786393cbf6146dd815c121df5e
Change-Id: If9ca1cc3d947b1656c836c7f88de90470d92f979
Change-Id: I1966fef9aee32eab50e4b984bbe81018488c8c02
Change-Id: Ib87a19a8ee2d532954a7db1eb5793666e7aef366
Change-Id: Ie82e75204e5a421d17660f381f3e31fc325cd26c
Change-Id: I74cc675764cf8d260fe68a41e770b1ec7e84729a
Copy link
Contributor

@anijain2305 anijain2305 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions. Close to be merged

include/tvm/tir/builtin.h Outdated Show resolved Hide resolved
topi/python/topi/math.py Outdated Show resolved Hide resolved
@anijain2305
Copy link
Contributor

@tmoreau89 @kparzysz-quic Please review again and approve explicitly.

@tmoreau89
Copy link
Contributor

@giuseros thanks for adding the test case, and thank you @anijain2305 for the thorough review. I suggest to follow through with the naming tweak and this PR should be good to go.

Change-Id: I5a8ed60ba855208040304fcdf6e1ea28061f06ad
@giuseros
Copy link
Contributor Author

Hi @anijain2305 , @tmoreau89 , @kparzysz-quic
I addressed the name change, reformatted the help text in math.py and addressed an integration change in the arm intrinsic (q.val -> q.value).

Thank you all for your comments and your time!

Copy link
Contributor

@kparzysz-quic kparzysz-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Copy link
Contributor

@tmoreau89 tmoreau89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @giuseros the changes LGTM

@tmoreau89
Copy link
Contributor

@anijain2305 please let us know if your requested changes have been addressed! thanks!

Copy link
Contributor

@anijain2305 anijain2305 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@anijain2305 anijain2305 merged commit ccacb1e into apache:master Jul 17, 2020
@anijain2305
Copy link
Contributor

Thanks @giuseros @tmoreau89 @kparzysz-quic This is merged!

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
* Fixed point multiplication improvements for AArch64

Change-Id: Ib3c10348d4c0eac11fa92b39cc6e792560e9eba4

* Fix python linting errors

Change-Id: I4cf5ac18aa24b39374b83805dcc8e1663e173909

* Fix doxygen errors

Change-Id: Ie3c861f8ead3f1ea5b30d5e9d7d94e222299d407

* Fix arm_cpu injective tests

Change-Id: I6ad9da61b61e6bd737627f26fba59767418c07cd

* Fix python linting errors - 2

Change-Id: Ic864a235aa5da5786393cbf6146dd815c121df5e

* Fix arm_cpu injective tests - 2

Change-Id: If9ca1cc3d947b1656c836c7f88de90470d92f979

* Redesign: introduce a qmuls (q-multiply and shift) general intrinsic

Change-Id: I1966fef9aee32eab50e4b984bbe81018488c8c02

* Fix python linting errors - 3

Change-Id: Ib87a19a8ee2d532954a7db1eb5793666e7aef366

* Addressing review comments

Change-Id: Ie82e75204e5a421d17660f381f3e31fc325cd26c

* Fixing test failures

Change-Id: I74cc675764cf8d260fe68a41e770b1ec7e84729a

* Renaming qmuls to q_multiply_shift

Change-Id: I5a8ed60ba855208040304fcdf6e1ea28061f06ad
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
* Fixed point multiplication improvements for AArch64

Change-Id: Ib3c10348d4c0eac11fa92b39cc6e792560e9eba4

* Fix python linting errors

Change-Id: I4cf5ac18aa24b39374b83805dcc8e1663e173909

* Fix doxygen errors

Change-Id: Ie3c861f8ead3f1ea5b30d5e9d7d94e222299d407

* Fix arm_cpu injective tests

Change-Id: I6ad9da61b61e6bd737627f26fba59767418c07cd

* Fix python linting errors - 2

Change-Id: Ic864a235aa5da5786393cbf6146dd815c121df5e

* Fix arm_cpu injective tests - 2

Change-Id: If9ca1cc3d947b1656c836c7f88de90470d92f979

* Redesign: introduce a qmuls (q-multiply and shift) general intrinsic

Change-Id: I1966fef9aee32eab50e4b984bbe81018488c8c02

* Fix python linting errors - 3

Change-Id: Ib87a19a8ee2d532954a7db1eb5793666e7aef366

* Addressing review comments

Change-Id: Ie82e75204e5a421d17660f381f3e31fc325cd26c

* Fixing test failures

Change-Id: I74cc675764cf8d260fe68a41e770b1ec7e84729a

* Renaming qmuls to q_multiply_shift

Change-Id: I5a8ed60ba855208040304fcdf6e1ea28061f06ad
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Sep 2, 2020
* Fixed point multiplication improvements for AArch64

Change-Id: Ib3c10348d4c0eac11fa92b39cc6e792560e9eba4

* Fix python linting errors

Change-Id: I4cf5ac18aa24b39374b83805dcc8e1663e173909

* Fix doxygen errors

Change-Id: Ie3c861f8ead3f1ea5b30d5e9d7d94e222299d407

* Fix arm_cpu injective tests

Change-Id: I6ad9da61b61e6bd737627f26fba59767418c07cd

* Fix python linting errors - 2

Change-Id: Ic864a235aa5da5786393cbf6146dd815c121df5e

* Fix arm_cpu injective tests - 2

Change-Id: If9ca1cc3d947b1656c836c7f88de90470d92f979

* Redesign: introduce a qmuls (q-multiply and shift) general intrinsic

Change-Id: I1966fef9aee32eab50e4b984bbe81018488c8c02

* Fix python linting errors - 3

Change-Id: Ib87a19a8ee2d532954a7db1eb5793666e7aef366

* Addressing review comments

Change-Id: Ie82e75204e5a421d17660f381f3e31fc325cd26c

* Fixing test failures

Change-Id: I74cc675764cf8d260fe68a41e770b1ec7e84729a

* Renaming qmuls to q_multiply_shift

Change-Id: I5a8ed60ba855208040304fcdf6e1ea28061f06ad
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Sep 3, 2020
* Fixed point multiplication improvements for AArch64

Change-Id: Ib3c10348d4c0eac11fa92b39cc6e792560e9eba4

* Fix python linting errors

Change-Id: I4cf5ac18aa24b39374b83805dcc8e1663e173909

* Fix doxygen errors

Change-Id: Ie3c861f8ead3f1ea5b30d5e9d7d94e222299d407

* Fix arm_cpu injective tests

Change-Id: I6ad9da61b61e6bd737627f26fba59767418c07cd

* Fix python linting errors - 2

Change-Id: Ic864a235aa5da5786393cbf6146dd815c121df5e

* Fix arm_cpu injective tests - 2

Change-Id: If9ca1cc3d947b1656c836c7f88de90470d92f979

* Redesign: introduce a qmuls (q-multiply and shift) general intrinsic

Change-Id: I1966fef9aee32eab50e4b984bbe81018488c8c02

* Fix python linting errors - 3

Change-Id: Ib87a19a8ee2d532954a7db1eb5793666e7aef366

* Addressing review comments

Change-Id: Ie82e75204e5a421d17660f381f3e31fc325cd26c

* Fixing test failures

Change-Id: I74cc675764cf8d260fe68a41e770b1ec7e84729a

* Renaming qmuls to q_multiply_shift

Change-Id: I5a8ed60ba855208040304fcdf6e1ea28061f06ad
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants