[TIR] add loop partition hint pragma #9121

wrongtest-intellif · 2021-09-26T05:49:15Z

Currently LoopPartition pass will try to partition loops assiociated with condition in likely tag, it would be great if developers can take control of which loop to partition, no-matter whether the condition to eliminate is "likely" tagged or not.

The PR add a pragma attr key loop_partition_hint, which can be tagged explicitly in schedule phase. The loop partition pass will consider all arith conditions for hinted loop var.

Below are two examples of how explicit controlled loop partition benefits, the target is on Ubuntu20.08 i7-7700, with llvm version 11.0 :

For max pooling with padding inlined, which create conditional buffer accesses

data = te.placeholder([1, 128, 56, 56], name="x")
out = topi.nn.pool2d(data, kernel=[5, 5], stride=[1, 1], padding=[2, 2, 2, 2], pool_type="max", dilation=[1, 1], layout="NCHW")
pad = out.op.input_tensors[0]
x = tvm.nd.array(np.random.randint(0, 64, [1, 128, 56, 56]).astype("float32"))

def test(do_partition):
    s = te.create_schedule([out.op])
    s[pad].compute_inline()
    n, c, h, w = s[out].op.axis
    if do_partition:
        s[out].pragma(h, "loop_partition_hint")
        s[out].pragma(w, "loop_partition_hint")

    with tvm.ir.transform.PassContext(config={"tir.LoopPartition": {"partition_const_loop": True}}):
        f = tvm.build(s, [data, out], "llvm")
    y = tvm.nd.array(np.zeros([1, 128, 56, 56]).astype("float32"))
    f(x, y)
    result = y.asnumpy()
    print(f.get_source("asm"))
    evaluator = f.time_evaluator(f.entry_name, tvm.cpu(), number=1000)
    print("partition=%s: %.3f millisecs" % (do_partition, evaluator(x, y).mean * 1000))
    return result

r1 = test(do_partition=False)
r2 = test(do_partition=True)
testing.assert_allclose(r1, r2, rtol=1e-5)

The performance I get:

no loop partition: 3.708 millisecs
with loop partition: 0.975 millisecs

For tiled matmul following TVM tensor expression tutorial, but with shape not divided by tiling factor. The tir split do not create a likely condition for it now.

M, N, K = 1025, 1025, 1025
dtype = "float32"
dev = tvm.cpu()
a = tvm.nd.array(np.random.rand(M, K).astype(dtype), dev)
b = tvm.nd.array(np.random.rand(K, N).astype(dtype), dev)
k = te.reduce_axis((0, K), "k")
A = te.placeholder((M, K), name="A")
B = te.placeholder((K, N), name="B")
C = te.compute((M, N), lambda x, y: te.sum(A[x, k] * B[k, y], axis=k), name="C")
f = te.create_prim_func([A, B, C])
s = tvm.tir.Schedule(f)

def evaluate_operation(s, target, optimization):
   with tvm.ir.transform.PassContext(config={"tir.LoopPartition": {"partition_const_loop": True}}):
       print(tvm.lower(s.mod["main"], [], simple_mode=True))
       func = tvm.build(s.mod["main"], [], target=target, name="mmult")
       assert func

   c = tvm.nd.array(np.zeros((M, N), dtype=dtype), dev)
   func(a, b, c)
   evaluator = func.time_evaluator(func.entry_name, dev, number=10)
   mean_time = evaluator(a, b, c).mean
   print("%s: %f" % (optimization, mean_time))

# no opt
evaluate_operation(s, target="llvm", optimization="none")

# tiling and vectorize
x, y, k = s.get_loops(s.get_block("C"))
xo, xi = s.split(x, factors=[None, 32])
yo, yi = s.split(y, factors=[None, 32])
ko, ki = s.split(k, factors=[None, 4])
s.reorder(xo, yo, ko, ki, xi, yi)
s.vectorize(yi)
evaluate_operation(s, target="llvm", optimization="blocking")

# loop partition
def pragma(s, rv, key):
   sref = s.get_sref(rv)
   loop = sref.stmt
   new_loop = tvm.tir.For(loop.loop_var, loop.min, loop.extent, loop.kind, loop.body, annotations={key: 1})
   s.state.replace(sref, new_loop)
pragma(s, xo, "pragma_loop_partition_hint")
pragma(s, yo, "pragma_loop_partition_hint")
evaluate_operation(s, target="llvm",  optimization="loop_partition")

The performance I get:

no opt: 1.374402
with tiling + vectorize: 0.843930
with tiling + vectorize + loop partition: 0.272183

junrushao · 2021-09-27T20:24:55Z

CC @areusch @Hzfengsy @vinx13 @zxybazh @ZihengJiang would you guys review this PR? looks like it's relevant to some of our previous discussion

include/tvm/tir/stmt.h

tests/python/unittest/test_tir_transform_loop_partition.py

wrongtest-intellif · 2021-09-29T07:14:27Z

@Hzfengsy hi~ I resolve the comment issues，can you kindly take another round of review：）？

Hzfengsy · 2021-09-29T08:49:18Z

Thanks, @wrongtest

* main: Fix flaky NMS test by making sure scores are unique (apache#9140) [Relay] Merge analysis/context_analysis.cc and transforms/device_annotation.cc (apache#9038) [LLVM] Make changes needed for opaque pointers (apache#9138) Arm(R) Ethos(TM)-U NPU codegen integration (apache#8849) [CI] Split Integration tests out of first phase of pipeline (apache#9128) [Meta Schedule][M3b] Runner (apache#9111) Fix Google Mock differences between Ubuntu 18.04 and 16.04 (apache#9141) [TIR] add loop partition hint pragma (apache#9121) fix things (apache#9146) [Meta Schedule][M3a] SearchStrategy (apache#9132) [Frontend][PyTorch] support for quantized conv_transpose2d op (apache#9133) [UnitTest] Parametrized test_conv2d_int8_intrinsics (apache#9143) [OpenCL] Remove redundant visit statement in CodeGen. (apache#9144) [BYOC] support arbitrary input dims for add/mul/relu of dnnl c_src codegen (apache#9127) [Relay][ConvertLayout] Support for qnn.conv2d_transpose (apache#9139) add nn.global_avgpool to fq2i (apache#9137) [UnitTests] Enable minimum testing on Vulkan target in CI (apache#9093) [Torch] Support returning quantized weights and bias for BYOC use cases (apache#9135) [Relay] Prepare for new plan_devices.cc (part II) (apache#9130) [microTVM][Zephyr] Add MIMXRT1050 board support (apache#9068)

* main: (80 commits) Introduce centralised name transformation functions (apache#9088) [OpenCL] Add vectorization to cuda conv2d_nhwc schedule (apache#8636) [6/6] Arm(R) Ethos(TM)-U NPU codegen integration with `tvmc` (apache#8854) [microTVM] Add wrapper for creating project using a MLF (apache#9090) Fix typo (apache#9156) [Hotfix][Testing] Wait for RPCServer to be established (apache#9150) Update find cublas so it search default path if needed. (apache#9149) [TIR][LowerMatchBuffer] Fix lowering strides when source region has higher dimension than the buffer (apache#9145) Fix flaky NMS test by making sure scores are unique (apache#9140) [Relay] Merge analysis/context_analysis.cc and transforms/device_annotation.cc (apache#9038) [LLVM] Make changes needed for opaque pointers (apache#9138) Arm(R) Ethos(TM)-U NPU codegen integration (apache#8849) [CI] Split Integration tests out of first phase of pipeline (apache#9128) [Meta Schedule][M3b] Runner (apache#9111) Fix Google Mock differences between Ubuntu 18.04 and 16.04 (apache#9141) [TIR] add loop partition hint pragma (apache#9121) fix things (apache#9146) [Meta Schedule][M3a] SearchStrategy (apache#9132) [Frontend][PyTorch] support for quantized conv_transpose2d op (apache#9133) [UnitTest] Parametrized test_conv2d_int8_intrinsics (apache#9143) ...

* add loop partition hint pragma * fix unintialized var * fix to remove hint at last * use tir compare for loop partition testcase

add loop partition hint pragma

7a714cb

wrongtest-intellif requested review from areusch, comaniac, Hzfengsy, jroesch, junrushao, kparzysz-quic, masahi, merrymercy, tqchen, vinx13, yzhliu and ZihengJiang as code owners September 26, 2021 05:49

wrongtest-intellif added 2 commits September 26, 2021 14:55

fix unintialized var

ee95d7a

fix to remove hint at last

8785135

Hzfengsy requested changes Sep 28, 2021

View reviewed changes

include/tvm/tir/stmt.h Show resolved Hide resolved

tests/python/unittest/test_tir_transform_loop_partition.py Outdated Show resolved Hide resolved

use tir compare for loop partition testcase

d4ea349

Hzfengsy approved these changes Sep 29, 2021

View reviewed changes

Hzfengsy merged commit 198a8ab into apache:main Sep 29, 2021

wrongtest-intellif deleted the support_loop_partition_hint branch September 29, 2021 09:45

ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022

[TIR] add loop partition hint pragma (apache#9121)

8260e95

* add loop partition hint pragma * fix unintialized var * fix to remove hint at last * use tir compare for loop partition testcase

ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022

[TIR] add loop partition hint pragma (apache#9121)

58ec471

* add loop partition hint pragma * fix unintialized var * fix to remove hint at last * use tir compare for loop partition testcase

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TIR] add loop partition hint pragma #9121

[TIR] add loop partition hint pragma #9121

wrongtest-intellif commented Sep 26, 2021 •

edited

Loading

junrushao commented Sep 27, 2021

wrongtest-intellif commented Sep 29, 2021

Hzfengsy commented Sep 29, 2021

[TIR] add loop partition hint pragma #9121

[TIR] add loop partition hint pragma #9121

Conversation

wrongtest-intellif commented Sep 26, 2021 • edited Loading

junrushao commented Sep 27, 2021

wrongtest-intellif commented Sep 29, 2021

Hzfengsy commented Sep 29, 2021

wrongtest-intellif commented Sep 26, 2021 •

edited

Loading