Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Adreno] Change compute/schedule for ToMixedPrecision pass #12537

Merged
merged 4 commits into from
Sep 1, 2022

Conversation

elvin-n
Copy link
Contributor

@elvin-n elvin-n commented Aug 22, 2022

The current support of mixed precision in adreno schedules was implemented as standalone schedules having "fp32" suffix. Such kernels can be selected during compilation due to two reasons

  1. We had higher priority for fp16_acc32 schedule than pure fp16 schedules
  2. If we saw AutoTVM tune statistics with proper schedule name

The tune flow in its turn was not able to distinguish and point only fp16 or fp16_acc32. Both schedules are tuned and during the compilation the schedule having best time is selected. I.e. by fact without artificial approach we are not able to tune and compile pure fp16 or fp16_acc32. Only manual selection of tune statistics causes currently to execute one of this mode.

In addition to this, the conversion function to fp16 was custom made in the user's script that isnot available to the public tvm user.

To address above issues we are proposing to use ToMixedPrecision() pass. It supports mixed precision (fp16 compute with fp32 accumulation) as well.

Current PR changes

  1. Adreno strategy to remove extra fp16_acc32 schedules
  2. topi/adreno/* to leave the only schedule per each convolution instead of 3
  3. topi/adreno/conv2d_alter_op.py <- to address performance issue happening in the new flow due to different order of casts and causing in some cases more data be passed between opencl kernels. We addressed case where we have number of input channel not divideable by 4 and number of output channels dividable by 4. Previously we generated 4 kernels for repacking data in runtime, now we have only two kernels for such case and we do not repack weights in runtime + we do not repack output back to NCHW

Copy link
Contributor

@csullivan csullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few scheduling related questions. I loved to see how the mixed precision pass has helped reduce the schedules.

python/tvm/topi/adreno/conv2d_nchw.py Outdated Show resolved Hide resolved
python/tvm/topi/adreno/conv2d_nhwc.py Outdated Show resolved Hide resolved
python/tvm/topi/adreno/conv2d_nhwc.py Show resolved Hide resolved
@@ -437,7 +437,7 @@ def test_conv2d_vgg16_winograd_4d():
stat_file = temp.relpath("stat.log")
with open(stat_file, "w") as f:
f.write(
'{"input": ["opencl -keys=adreno,opencl,gpu -device=adreno -max_num_threads=256", "conv2d_nchw_winograd_acc32.image2d", [["TENSOR", [1, 512, 28, 28], "float16"], ["TENSOR", [512, 512, 3, 3], "float16"], [1, 1], [1, 1, 1, 1], [1, 1], "float16"], {}], "config": {"index": 1591, "code_hash": null, "entity": [["auto_unroll_max_step", "ot", 4], ["tile_y", "sp", [-1, 1, 32]], ["tile_x", "sp", [-1, 4, 2]], ["tile_rc", "sp", [-1, 8]]]}, "result": [[0.0037244], 0, 7.06374192237854, 1653898629.7427933], "version": 0.2, "tvm_version": "0.8.dev0"}\n'
'{"input": ["opencl -keys=adreno,opencl,gpu -device=adreno -max_num_threads=256", "conv2d_nchw_winograd.image2d", [["TENSOR", [1, 512, 28, 28], "float16"], ["TENSOR", [512, 512, 3, 3], "float16"], [1, 1], [1, 1, 1, 1], [1, 1], "float16"], {}], "config": {"index": 1591, "code_hash": null, "entity": [["auto_unroll_max_step", "ot", 4], ["tile_y", "sp", [-1, 1, 32]], ["tile_x", "sp", [-1, 4, 2]], ["tile_rc", "sp", [-1, 8]]]}, "result": [[0.0037244], 0, 7.06374192237854, 1653898629.7427933], "version": 0.2, "tvm_version": "0.8.dev0"}\n'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@TejashShah
Copy link

cc @AndrewZhaoLuo @mbrookhart

@AndrewZhaoLuo
Copy link
Contributor

Sorry, I will take a look tomorrow

Copy link
Contributor

@AndrewZhaoLuo AndrewZhaoLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me, but I don't know this part of codebase well. I will defer to others.

@AndrewZhaoLuo AndrewZhaoLuo merged commit e814f79 into apache:main Sep 1, 2022
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
)

* [Adreno] Change compute/schedule for ToMixedPrecision pass

* Address CI fails

* address PR comments

* Fix AutoTVM flow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants