[Adreno] Change compute/schedule for ToMixedPrecision pass #12537

elvin-n · 2022-08-22T12:53:13Z

The current support of mixed precision in adreno schedules was implemented as standalone schedules having "fp32" suffix. Such kernels can be selected during compilation due to two reasons

We had higher priority for fp16_acc32 schedule than pure fp16 schedules
If we saw AutoTVM tune statistics with proper schedule name

The tune flow in its turn was not able to distinguish and point only fp16 or fp16_acc32. Both schedules are tuned and during the compilation the schedule having best time is selected. I.e. by fact without artificial approach we are not able to tune and compile pure fp16 or fp16_acc32. Only manual selection of tune statistics causes currently to execute one of this mode.

In addition to this, the conversion function to fp16 was custom made in the user's script that isnot available to the public tvm user.

To address above issues we are proposing to use ToMixedPrecision() pass. It supports mixed precision (fp16 compute with fp32 accumulation) as well.

Current PR changes

Adreno strategy to remove extra fp16_acc32 schedules
topi/adreno/* to leave the only schedule per each convolution instead of 3
topi/adreno/conv2d_alter_op.py <- to address performance issue happening in the new flow due to different order of casts and causing in some cases more data be passed between opencl kernels. We addressed case where we have number of input channel not divideable by 4 and number of output channels dividable by 4. Previously we generated 4 kernels for repacking data in runtime, now we have only two kernels for such case and we do not repack weights in runtime + we do not repack output back to NCHW

csullivan

Just a few scheduling related questions. I loved to see how the mixed precision pass has helped reduce the schedules.

python/tvm/topi/adreno/conv2d_nchw.py

python/tvm/topi/adreno/conv2d_nhwc.py

csullivan · 2022-08-23T16:17:00Z

tests/python/relay/test_conv2d_nchw_texture.py

@@ -437,7 +437,7 @@ def test_conv2d_vgg16_winograd_4d():
    stat_file = temp.relpath("stat.log")
    with open(stat_file, "w") as f:
        f.write(
-            '{"input": ["opencl -keys=adreno,opencl,gpu -device=adreno -max_num_threads=256", "conv2d_nchw_winograd_acc32.image2d", [["TENSOR", [1, 512, 28, 28], "float16"], ["TENSOR", [512, 512, 3, 3], "float16"], [1, 1], [1, 1, 1, 1], [1, 1], "float16"], {}], "config": {"index": 1591, "code_hash": null, "entity": [["auto_unroll_max_step", "ot", 4], ["tile_y", "sp", [-1, 1, 32]], ["tile_x", "sp", [-1, 4, 2]], ["tile_rc", "sp", [-1, 8]]]}, "result": [[0.0037244], 0, 7.06374192237854, 1653898629.7427933], "version": 0.2, "tvm_version": "0.8.dev0"}\n'
+            '{"input": ["opencl -keys=adreno,opencl,gpu -device=adreno -max_num_threads=256", "conv2d_nchw_winograd.image2d", [["TENSOR", [1, 512, 28, 28], "float16"], ["TENSOR", [512, 512, 3, 3], "float16"], [1, 1], [1, 1, 1, 1], [1, 1], "float16"], {}], "config": {"index": 1591, "code_hash": null, "entity": [["auto_unroll_max_step", "ot", 4], ["tile_y", "sp", [-1, 1, 32]], ["tile_x", "sp", [-1, 4, 2]], ["tile_rc", "sp", [-1, 8]]]}, "result": [[0.0037244], 0, 7.06374192237854, 1653898629.7427933], "version": 0.2, "tvm_version": "0.8.dev0"}\n'


TejashShah · 2022-08-24T01:02:32Z

cc @AndrewZhaoLuo @mbrookhart

AndrewZhaoLuo · 2022-08-30T03:38:14Z

Sorry, I will take a look tomorrow

AndrewZhaoLuo

Looks fine to me, but I don't know this part of codebase well. I will defer to others.

) * [Adreno] Change compute/schedule for ToMixedPrecision pass * Address CI fails * address PR comments * Fix AutoTVM flow

csullivan reviewed Aug 23, 2022

View reviewed changes

csullivan approved these changes Aug 30, 2022

View reviewed changes

AndrewZhaoLuo reviewed Aug 30, 2022

View reviewed changes

elvin-n added 4 commits August 31, 2022 23:17

[Adreno] Change compute/schedule for ToMixedPrecision pass

74764e9

Address CI fails

e914fee

address PR comments

44ebc5e

Fix AutoTVM flow

02381d0

elvin-n force-pushed the amalyshe/mixed_precision branch from 81075fd to 02381d0 Compare August 31, 2022 20:18

AndrewZhaoLuo merged commit e814f79 into apache:main Sep 1, 2022

AndrewZhaoLuo mentioned this pull request Oct 4, 2022

TVM v0.10.0.rc0 Release Candidate Notes #12979

Closed

xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022

[Adreno] Change compute/schedule for ToMixedPrecision pass (apache#12537

fb1efc3

) * [Adreno] Change compute/schedule for ToMixedPrecision pass * Address CI fails * address PR comments * Fix AutoTVM flow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Adreno] Change compute/schedule for ToMixedPrecision pass #12537

[Adreno] Change compute/schedule for ToMixedPrecision pass #12537

elvin-n commented Aug 22, 2022

csullivan left a comment

csullivan Aug 23, 2022

TejashShah commented Aug 24, 2022

AndrewZhaoLuo commented Aug 30, 2022

AndrewZhaoLuo left a comment

[Adreno] Change compute/schedule for ToMixedPrecision pass #12537

[Adreno] Change compute/schedule for ToMixedPrecision pass #12537

Conversation

elvin-n commented Aug 22, 2022

csullivan left a comment

Choose a reason for hiding this comment

csullivan Aug 23, 2022

Choose a reason for hiding this comment

TejashShah commented Aug 24, 2022

AndrewZhaoLuo commented Aug 30, 2022

AndrewZhaoLuo left a comment

Choose a reason for hiding this comment