Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hexagon] Enable depthwise conv2d NHWC with an HWIO kernel layout #13414

Merged
merged 11 commits into from
Dec 13, 2022
3 changes: 1 addition & 2 deletions python/tvm/relay/op/strategy/hexagon.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,9 +86,8 @@ def conv2d_strategy_hexagon(attrs, inputs, out_type, target):
name="depthwise_conv2d_nchw.hexagon",
)
elif layout == "NHWC":
assert kernel_layout == "HWOI"
strategy.add_implementation(
wrap_compute_conv2d(topi.nn.depthwise_conv2d_nhwc),
wrap_compute_conv2d(topi.nn.depthwise_conv2d_nhwc, need_kernel_layout=True),
wrap_topi_schedule(topi.hexagon.schedule_depthwise_conv2d_nhwc),
name="depthwise_conv2d_nhwc.hexagon",
)
Expand Down
3 changes: 1 addition & 2 deletions python/tvm/relay/op/strategy/x86.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,13 +228,12 @@ def conv2d_strategy_cpu(attrs, inputs, out_type, target):
assert _OIHWio_matcher.match(kernel_layout) # check if kernel is OIHWio
return depthwise_conv2d_NCHWc_strategy_cpu(attrs, inputs, out_type, target)
elif layout == "NHWC":
assert kernel_layout == "HWOI"
if (not need_auto_scheduler_layout) and (not need_meta_schedule_layout):
logger.warning(
"depthwise_conv2d NHWC layout is not optimized for x86 with autotvm."
)
strategy.add_implementation(
wrap_compute_conv2d(topi.nn.depthwise_conv2d_nhwc),
wrap_compute_conv2d(topi.nn.depthwise_conv2d_nhwc, need_kernel_layout=True),
wrap_topi_schedule(topi.generic.schedule_depthwise_conv2d_nhwc),
name="depthwise_conv2d_nhwc.generic",
)
Expand Down
25 changes: 20 additions & 5 deletions python/tvm/topi/nn/depthwise_conv2d.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from __future__ import absolute_import as _abs
from collections import namedtuple
import tvm
import numpy as np
from tvm import te

from .dilate import dilate
Expand Down Expand Up @@ -211,7 +212,7 @@ def depthwise_conv2d_nchw(Input, Filter, stride, padding, dilation, out_dtype=No
return Output


def depthwise_conv2d_nhwc(Input, Filter, stride, padding, dilation, out_dtype=None):
def depthwise_conv2d_nhwc(Input, Filter, stride, padding, dilation, kernel_layout, out_dtype=None):
"""Depthwise convolution nhwc forward operator.

Parameters
Expand Down Expand Up @@ -252,8 +253,18 @@ def depthwise_conv2d_nhwc(Input, Filter, stride, padding, dilation, out_dtype=No
dilation_h, dilation_w = dilation

batch, in_height, in_width, in_channel = Input.shape

dim = len(Input.shape) - 2

# shape of dilated kernel
filter_height, filter_width, filter_channel, channel_multiplier = Filter.shape
if kernel_layout == "HWOI":
filter_height, filter_width, filter_channel, channel_multiplier = Filter.shape
kernel_permutation_to = [0, 1] + list(range(2, dim + 2))
elif kernel_layout == "HWIO":
filter_height, filter_width, channel_multiplier, filter_channel = Filter.shape
kernel_permutation_to = [dim + 1, dim] + list(range(dim))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing how this permutation generates HWIO. This defines kernel_permutation_to as [3, 2, 0, 1], so kernel_permutation_from is [2, 3, 1, 0]. With the usage below, that would permute from [di, dj, c//channel_multiplier, c%channel_multiplier] to [c//channel_multiplier, c%channel_multiplier, dj, di], which would be OIWH.

Should this be list(range(dim)) + [dim + 1, dim] instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lunderberg You are right, my bad. Not sure why I thought this would result in HWIO. Thanks so much for catching this bug.


kernel_permutation_from = np.argsort(kernel_permutation_to)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a benefit to defining in terms of kernel_permutation_to instead of directly defining kernel_permutation_from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to follow this as much as possible so in the future we can add more features if needed. If you think this is not the best way I could change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. It looks like that implementation is trying to be more clever, and to identify the permutation by inspecting the string. In this case, since we're only supporting two explicitly enabled shapes, I'd lean for defining a kernel_permutation directly for each one.

if kernel_layout == 'HWOI':
    kernel_permutaiton = [0,1,2,3]
elif kernel_layout == 'HWIO':
    kernel_permutaiton = [0,1,3,2]
else:
    raise ValueError(f'Unsupported kernel layout: {kernel_layout}')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, if there are many locations where the same kernel permutation definitions are used, it may be useful to pull it out into a common utility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the way of permutation from your recommendation. @Lunderberg


dilated_kernel_h = (filter_height - 1) * dilation_h + 1
dilated_kernel_w = (filter_width - 1) * dilation_w + 1
Expand Down Expand Up @@ -284,9 +295,13 @@ def depthwise_conv2d_nhwc(Input, Filter, stride, padding, dilation, out_dtype=No
j * stride_w + dj * dilation_w,
idxdiv(c, channel_multiplier),
].astype(out_dtype)
* Filter[
di, dj, idxdiv(c, channel_multiplier), idxmod(c, channel_multiplier)
].astype(out_dtype)
* Filter.__getitem__(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explicit __getitem__ isn't required. Filter[x,y,z] is syntactic sugar for Filter[(x,y,z)]. If you already have a tuple, then Filter[tuple( ... )] can be called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, I have changed this as your recommendation.

tuple(
np.array(
[di, dj, idxdiv(c, channel_multiplier), idxmod(c, channel_multiplier)]
)[kernel_permutation_from]
)
).astype(out_dtype)
),
axis=[di, dj],
),
Expand Down