[RELAY][OP] roi_align operator alter layout #6443

Beya2019 · 2020-09-10T12:37:43Z

RFC: #4335
https://discuss.tvm.ai/t/layout-conversion-pass/4009

add aoi_align operator(in maskrcnn) convert_op_layout and related test case in test_pass_convert_op_layout.py .

Would you please have a look at this @yzhliu @vinx13 @anijain2305 @tqchen

anijain2305 · 2020-09-10T17:07:08Z

Is there a NHWC topi implementation for roi_align? If not, this will cause compilation failure for traditional TVM targets.

kevinthesun · 2020-09-10T22:57:59Z

Yeah. As @anijain2305 mentioned, we only have roi_align_nchw in topi.

kevinthesun · 2020-09-10T23:00:42Z

python/tvm/relay/op/vision/rcnn.py

@@ -24,7 +24,7 @@ def roi_align(data, rois, pooled_size, spatial_scale, sample_ratio=-1, layout='N
    Parameters
    ----------
    data : relay.Expr
-        4-D tensor with shape [batch, channel, height, width]
+        4-D tensor with shape [batch, channel, height, width] or [batch, height, width, channel]


Do we really need another layout for this op, since layout won't affect performance much for it?

Hi @kevinthesun and @anijain2305,

This NHWC layout topi implementation for roi_align is really not currently supported for traditional TVM targets. But in relay/frontend/mxnet.py the _mx_roi_align operator is fixed to NCHW(new_attrs["layout"] = "NCHW"), that is only the nchw topi implemention process is called normally. For the special target which is only support nhwc layout, we can add extra code in mxnet.py(or other frontend py) as follows(RFC:https://tvm.apache.org/docs/dev/convert_layout.html#usage):

desired_layouts = {'vision.roi_align': ['NHWC', `default`]} # Convert the layout to NCHW # RemoveUnunsedFunctions is used to clean up the graph. seq = tvm.transform.Sequential([relay.transform.RemoveUnusedFunctions(), relay.transform.ConvertLayout(desired_layouts)]) with tvm.transform.PassContext(opt_level=3): mod = seq(mod)

So, it doesn't cause the compilation failure for the nhwc topi implementation process can not be called for traditional TVM targets. but it expanded the nhwc relay ir implement with their own backend code for the special target which is only support nhwc layout.

Certainly, the nhwc layout topi implementation is to be realized sooner or later, I think it can be improved gradually in the future.

@kevinthesun and @anijain2305, In addition, I would like to add one more point.

In our own target, this implementation method really affect our performance for it add many layout_transpose operator(Our convolution only supports nhwc, if roi_align only supports nchw, it means that many additional layout_transform operators need to be inserted into the network), I also think really need nhwc layout implementation for this op considering performance. And this submission is needed for layout convert support whether nhwc layout realized or not. The two mentioned function are not conflicting but complementary, I also hope the nhwc layout topi implementation as soon as possible.

OK. One thing to note is that on relay level we do check NCHW layout for roi_align: https://github.com/apache/incubator-tvm/blob/master/src/relay/op/vision/rcnn_op.cc#L46 If we would like to allow NHWC roi_align, we might want to remove this constraint for relay type inference and move it to op strategy for corresponding targets, since we don't have implementation for NHWC and usually layout for roi_align is not an issue in end to end inference for these general purpose targets.

@kevinthesun

"we need remove this constraint for relay type inference and move it to op strategy for corresponding targets"

For the issue, there already exists the constraint as:

python/tvm/relay/op/strategy/cuda.py

@roi_align_strategy.register(["cuda", "gpu"]) def roi_align_strategy_cuda(attrs, inputs, out_type, target): """roi_align cuda strategy""" strategy = _op.OpStrategy() strategy.add_implementation(wrap_compute_roi_align(topi.vision.rcnn.roi_align_nchw), wrap_topi_schedule(topi.cuda.schedule_roi_align), name="roi_align_nchw.cuda") return strategy

and python/tvm/relay/op/strategy/x86.py

@roi_align_strategy.register("cpu") def roi_align_strategy_cpu(attrs, inputs, out_type, target): """roi_align x86 strategy""" strategy = _op.OpStrategy() strategy.add_implementation(wrap_compute_roi_align(topi.x86.roi_align_nchw), wrap_topi_schedule(topi.generic.schedule_roi_align), name="roi_align.x86") return strategy

In future, we just need to realize the topi.vision.rcnn.roi_align_nhwc and add the constraint(call topi.vision.rcnn.roi_align_nhwc) to the special target. And I will realize this job as soon as possible.

Today we don't check roi_align layout in op strategy since we do the check in relay infer type level. If we want to allow NHWC in relay level, we need to remove that constraint in ROIAlignRel and move it here. Silently selecting nchw implementation and fail in later stage(even runtime) is not great.

Hi @Beya2019 ,
We understand your usecase with the special target. But, when we put add changes in TVM, we want to ensure that we don't break things for x86, arm and gpus. Now, what @kevinthesun and I are saying is that if you try compiling a graph for x86(or gpu) with this PR, you should see compilation failure. Our reasoning is that the ConvertLayout will introduce roi_align with NHWC layout, and then InferType will be called, this will cause failure. Can you please try this and confirm?

xqdan

Hi, @kevinthesun @anijain2305 @yzhliu @tqchen IMO, ops with different layout in graph level should be orthogonal with ops implementation of different backends. if we need to implement ops when we add another layout, it's a big burden for developers.

tests/python/relay/test_op_level5.py

Beya2019 · 2020-09-14T12:02:29Z

Hi, @kevinthesun @anijain2305

According to your suggestion, i have added the topi compute and schedule of NHWC layout for x86, arm target. Meantime I add the test case in tests/python/relay/test_op_level5.py / tests/python/topi/python/test_topi_vision.py / tests/python/relay/test_pass_convert_op_layout.py to confirm the correctness of calculation and ensure there is no compile failure due to the introduction of the ConvertLayout for roi_align NHWC layout and InferType is be called. Can you help me to have a review again and merge?

kevinthesun · 2020-09-14T17:58:32Z

@xqdan @Beya2019 I'm not quite sure about whether we really need a NHWC layout topi implementation for roi_align on general purpose hardware, since IMO this should not become any major concern for performance. What @anijain2305 and I are suggesting is that we move the check for NCHW from relay type infer into op strategy for corresponding targets which don't have NHWC implementation.

I agree with @xqdan that topi implementation doesn't need to bind with relay layout. As a result, can we have some data showing the performance difference between NCHW and NHWC layouts roi_align for CPU and a new topi implementation is necessary?

xqdan · 2020-09-15T07:20:48Z

@xqdan @Beya2019 I'm not quite sure about whether we really need a NHWC layout topi implementation for roi_align on general purpose hardware, since IMO this should not become any major concern for performance. What @anijain2305 and I are suggesting is that we move the check for NCHW from relay type infer into op strategy for corresponding targets which don't have NHWC implementation.

I agree with @xqdan that topi implementation doesn't need to bind with relay layout. As a result, can we have some data showing the performance difference between NCHW and NHWC layouts roi_align for CPU and a new topi implementation is necessary?

Maybe just remove NHWC layouts roi_align cpu op, and make sure relay type infer check for different target, make it easy for @Beya2019 , what do you think? @kevinthesun

Beya2019 · 2020-09-15T07:23:41Z

Hi, @kevinthesun

I have move the check for NCHW from relay type infer into op strategy for corresponding targets. eg:

    layout = attrs.layout
    if layout == "NCHW":
        strategy.add_implementation(
            wrap_compute_roi_align(topi.vision.rcnn.roi_align_nchw),
            wrap_topi_schedule(topi.generic.schedule_roi_align),
            name="roi_align_nchw.generic",
        )
    elif layout == "NHWC":
        strategy.add_implementation(
            wrap_compute_roi_align(topi.vision.rcnn.roi_align_nhwc),
            wrap_topi_schedule(topi.generic.schedule_roi_align),
            name="roi_align_nhwc.generic",
        )

we depend on the attrs.layout to call different compute and schedule. and we realize the nhwc layout basic compute refered to nchw layout compute to ensure the the basic process is passable.

In additon， What I understand is: different targets support different layouts, and the user chooses the high-performance layout supported by his target. The right to choose which layout is open to users, users can use transform.ConvertLayout(desired_layouts) to convert between NHWC and NCHW layouts. And the nhwc layout compute we added is to ensure that the conversion will not cause compilation errors. In the feature, if other users have the performance demand for GPU or other targets, he can add the corresponding schedule to improve performance. The current implementation only provides a general method.

kevinthesun · 2020-09-15T19:15:19Z

@xqdan That's what I meant.

kevinthesun · 2020-09-15T19:23:03Z

@Beya2019 Thanks for adding nhwc implementation. However, this also adds maintenance burden if there is no obvious performance benefit. Recently we are adding dynamic shape support for some ops. If we just keep necessary implementations it would be easier to maintain and extend.

Beya2019 · 2020-09-16T02:31:41Z

@kevinthesun :I got it. According to your suggestion, I modified the code. Please review again, does it meets the demans? If you have any suggestions, please feedback in time? Thanks very much.

kevinthesun

LGTM. @anijain2305 Any more feedbacks?

kevinthesun · 2020-09-16T19:23:54Z

src/relay/op/vision/rcnn_op.cc

  reporter->Assign(types[2], TensorType(oshape, data->dtype));
  return true;
 }

+template <typename T>
+Array<Array<Layout> > ROIAlignInferCorrectLayout(const Attrs& attrs,


There are still issues for ROIAlignInferCorrectLayout. I applied this patch and pytorch od model compilation will fail with seg fault.

The issue can be reproduced by applying this patch to #6449 and run test_detection_models() in tests/python/frontend/pytorch/test_object_detection.py.

Beya2019 · 2020-09-17T11:17:49Z

Hi @kevinthesun:
I have fixed it. Please review.
In addition, I find #6449 has merged. But you do not enable the test_detection_models() test case in tests/python/frontend/pytorch/test_object_detection.py, and must be tested manually. So the tvm ci system cannot test this case automatically. Please note it.

kevinthesun · 2020-09-17T17:40:38Z

Hi @kevinthesun:
I have fixed it. Please review.
In addition, I find #6449 has merged. But you do not enable the test_detection_models() test case in tests/python/frontend/pytorch/test_object_detection.py, and must be tested manually. So the tvm ci system cannot test this case automatically. Please note it.

It is tested by pytest.

kevinthesun

LGTM. I'll leave it open for a while to see if there is any other feedback coming.

kevinthesun · 2020-09-18T17:29:41Z

Thanks @Beya2019 @anijain2305 @xqdan

* [RELAY][OP] roi_align operator alter layout * [RELAY][OP] roi_align operator alter layout * [RELAY][OP] roi_align operator alter layout * [RELAY][OP] roi_align operator alter layout Co-authored-by: honghua.cao <honghua.cao@streamcomputing.com>

kevinthesun requested changes Sep 10, 2020

View reviewed changes

Beya2019 force-pushed the master branch from 39e245f to 6816129 Compare September 11, 2020 03:37

xqdan reviewed Sep 14, 2020

View reviewed changes

tests/python/relay/test_op_level5.py Outdated Show resolved Hide resolved

[RELAY][OP] roi_align operator alter layout

7239716

Beya2019 force-pushed the master branch from c1c25eb to 7239716 Compare September 16, 2020 02:10

honghua.cao added 2 commits September 16, 2020 10:44

Merge branch 'master' of https://github.com/apache/incubator-tvm

2523f72

[RELAY][OP] roi_align operator alter layout

fa690e5

kevinthesun approved these changes Sep 16, 2020

View reviewed changes

kevinthesun requested changes Sep 16, 2020

View reviewed changes

ZihengJiang added the status: review in progress label Sep 16, 2020

honghua.cao added 3 commits September 17, 2020 15:17

Merge branch 'master' of https://github.com/apache/incubator-tvm

928a211

Merge branch 'master' of https://github.com/apache/incubator-tvm

f180976

[RELAY][OP] roi_align operator alter layout

415221a

[RELAY][OP] roi_align operator alter layout

04a04d0

kevinthesun approved these changes Sep 17, 2020

View reviewed changes

kevinthesun merged commit 71b6a35 into apache:master Sep 18, 2020

Beya2019 mentioned this pull request Sep 19, 2020

[RELAY][OP] roi_pool operator alter layout #6516

Merged

masahi mentioned this pull request Feb 17, 2021

[TOPI, Relay] Support roi_align NHWC layout #7463

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RELAY][OP] roi_align operator alter layout #6443

[RELAY][OP] roi_align operator alter layout #6443

Beya2019 commented Sep 10, 2020

anijain2305 commented Sep 10, 2020

kevinthesun commented Sep 10, 2020

kevinthesun Sep 10, 2020 •

edited

Loading

Beya2019 Sep 11, 2020

Beya2019 Sep 11, 2020

kevinthesun Sep 11, 2020 •

edited

Loading

Beya2019 Sep 11, 2020 •

edited

Loading

kevinthesun Sep 11, 2020

anijain2305 Sep 11, 2020

xqdan left a comment

Beya2019 commented Sep 14, 2020 •

edited

Loading

kevinthesun commented Sep 14, 2020

xqdan commented Sep 15, 2020 •

edited

Loading

Beya2019 commented Sep 15, 2020

kevinthesun commented Sep 15, 2020

kevinthesun commented Sep 15, 2020

Beya2019 commented Sep 16, 2020

kevinthesun left a comment

kevinthesun Sep 16, 2020 •

edited

Loading

kevinthesun Sep 16, 2020

Beya2019 commented Sep 17, 2020 •

edited

Loading

kevinthesun commented Sep 17, 2020

kevinthesun left a comment

kevinthesun commented Sep 18, 2020

[RELAY][OP] roi_align operator alter layout #6443

[RELAY][OP] roi_align operator alter layout #6443

Conversation

Beya2019 commented Sep 10, 2020

anijain2305 commented Sep 10, 2020

kevinthesun commented Sep 10, 2020

kevinthesun Sep 10, 2020 • edited Loading

Choose a reason for hiding this comment

Beya2019 Sep 11, 2020

Choose a reason for hiding this comment

Beya2019 Sep 11, 2020

Choose a reason for hiding this comment

kevinthesun Sep 11, 2020 • edited Loading

Choose a reason for hiding this comment

Beya2019 Sep 11, 2020 • edited Loading

Choose a reason for hiding this comment

kevinthesun Sep 11, 2020

Choose a reason for hiding this comment

anijain2305 Sep 11, 2020

Choose a reason for hiding this comment

xqdan left a comment

Choose a reason for hiding this comment

Beya2019 commented Sep 14, 2020 • edited Loading

kevinthesun commented Sep 14, 2020

xqdan commented Sep 15, 2020 • edited Loading

Beya2019 commented Sep 15, 2020

kevinthesun commented Sep 15, 2020

kevinthesun commented Sep 15, 2020

Beya2019 commented Sep 16, 2020

kevinthesun left a comment

Choose a reason for hiding this comment

kevinthesun Sep 16, 2020 • edited Loading

Choose a reason for hiding this comment

kevinthesun Sep 16, 2020

Choose a reason for hiding this comment

Beya2019 commented Sep 17, 2020 • edited Loading

kevinthesun commented Sep 17, 2020

kevinthesun left a comment

Choose a reason for hiding this comment

kevinthesun commented Sep 18, 2020

kevinthesun Sep 10, 2020 •

edited

Loading

kevinthesun Sep 11, 2020 •

edited

Loading

Beya2019 Sep 11, 2020 •

edited

Loading

Beya2019 commented Sep 14, 2020 •

edited

Loading

xqdan commented Sep 15, 2020 •

edited

Loading

kevinthesun Sep 16, 2020 •

edited

Loading

Beya2019 commented Sep 17, 2020 •

edited

Loading