Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Matmul] Add matmul op #8234

Merged
merged 29 commits into from
Jun 30, 2021
Merged

[Matmul] Add matmul op #8234

merged 29 commits into from
Jun 30, 2021

Conversation

jcf94
Copy link
Contributor

@jcf94 jcf94 commented Jun 10, 2021

The nn.dense op and nn.batch_matmul op may have bad performance in some model whose weight is not a const parameters. We have some discussion in https://discuss.tvm.apache.org/t/discussion-about-the-weight-layout-of-dense-batchmatmul/10171

This PR:

  • Add an extra nn.matmul op that supports data tensor and weight tensor to be transposed or non-transposed format
  • Make the nn.dense as an alias of nn.matmul when data tensor is non-transposed and weight tensor is transposed
  • Add an extra option for tensorflow frontend that we can choose to use nn.dense or nn.matmul.

Since currently we only have complete schedule support for nn.dense, the nn.dense approach is still used by default in different frontends.

Will later add full transpose support for nn.batch_matmul in another PR.

cc @comaniac @tqchen @FrozenGene @junrushao1994 @altanh

@jcf94 jcf94 changed the title [Matmul] Add an matmul op for Relay matmul op [Matmul] Add matmul op Jun 10, 2021
@junrushao
Copy link
Member

CC @altanh @tkonolige

@tkonolige
Copy link
Contributor

tkonolige commented Jun 10, 2021

To clarify: you are replacing topi dense implementations with the new matmul one. But you are leaving nn.dense as relay op and adding another relay op called nn.matmul. Why not remove the nn.dense relay op?

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I would suggest the following:

  1. In TOPI generic, we have generic matmul compute/schedule for all platforms.
  2. In TOPI x86/ARM/CUDA with cblas/cublas enbaled, we use the libraries for all matmuls.
  3. In TOPI x86/ARM/CUDA w/o cblas/cublas, we use the current dense schedule for matmul(False, True), and throw warning for other cases.

For alias, to maintain the compatibility, I agree that we should keep alias for both Relay and TE, but it would be clean if we just simply keep nn.matmul ops and make nn.dense a syntax sugar. In this way, users can still use nn.dense in Relay (equivalent to nn.matmul(False, True)), and use topi.nn.dense in TE (an alias of topi.nn.matmul).

python/tvm/relay/op/nn/nn.py Outdated Show resolved Hide resolved
python/tvm/relay/op/nn/nn.py Outdated Show resolved Hide resolved
python/tvm/relay/op/nn/nn.py Outdated Show resolved Hide resolved
python/tvm/relay/op/strategy/cuda.py Show resolved Hide resolved
python/tvm/topi/nn/dense.py Outdated Show resolved Hide resolved
Copy link
Contributor

@altanh altanh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! This is a tough one due to backwards compatibility with nn.dense, I had some nits and questions here and there. After seeing the changes needed, I wonder if it would be better for now to keep matmul and dense separated (but call into the dense compute implementations for NT ops), as changing the DenseAttrs might cause some compatibility problems. But if we want to ultimately eliminate dense, maybe this is the right path. Curious to see other's thoughts on this

@@ -1230,6 +1239,9 @@ def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None):
params : dict of str to tvm.nd.NDArray
Dict of converted parameters stored in tvm.nd.NDArray format
"""
global _USE_DENSE_INSTEAD_OF_MATMUL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to avoid using this global variable? I'm not familiar with the importer but would be nice if we could use an importer config dict or something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've also tried several ways, but seems there is no better solution from my view. Python module can be seen as a const singleton, this should be safe if the from_tensorflow function is the only entry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is confusing too. If _USE_DENSE_INSTEAD_OF_MATMUL is not supposed to be changed by users directly, we should improve the comments of this global variable. Please see my comment at the global variable.

btw, in this case we can simply _USE_DENSE_INSTEAD_OF_MATMUL = use_dense_op without checking if they are the same or not.

include/tvm/relay/attrs/nn.h Outdated Show resolved Hide resolved
python/tvm/relay/op/nn/_nn.py Outdated Show resolved Hide resolved
python/tvm/relay/op/nn/_nn.py Outdated Show resolved Hide resolved
python/tvm/relay/op/nn/_nn.py Show resolved Hide resolved
python/tvm/relay/op/strategy/cuda.py Outdated Show resolved Hide resolved
compute_lambda = lambda i, j: te.sum(
data[i, k].astype(out_dtype) * weight[j, k].astype(out_dtype), axis=k
)
compute_name = "T_dense"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to keep this as dense or can we unify it to be T_matmul_NT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcf94 please discuss as I have the same issue above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its fine for it is just a op name. 😄 But the tag dense has been used in some schedule check, so I think we'd better keep that.

There're some options I can come up with:

  • A: Use T_dense as name and dense as tag for NT format, use T_matmul as name and matmul as tag for all other 3 format.
  • B: Use T_matmul_NN, T_matmul_NT, T_matmul_TN, T_matmul_TT as name for each format, use dense as tag for NT format and matmul as tag for others.

What do you think about?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally vote for B.

python/tvm/topi/nn/dense.py Outdated Show resolved Hide resolved
src/relay/op/nn/nn.h Outdated Show resolved Hide resolved
tests/python/relay/test_op_level1.py Show resolved Hide resolved
@jcf94
Copy link
Contributor Author

jcf94 commented Jun 11, 2021

@tkonolige @comaniac @altanh Thanks!

Except for these nit comments about name and coding style, I think currently the problem is still how to keep the existing nn.dense schedules for different devices working while adding a new nn.matmul op.

@comaniac 's solution is exactly what I desired. But in the current op strategy, since all of these existing schedules are registered with nn.dense, we would still need a real nn.dense op.

@jcf94 jcf94 changed the title [Matmul] Add matmul op [WIP][Matmul] Add matmul op Jun 21, 2021
@jcf94
Copy link
Contributor Author

jcf94 commented Jun 26, 2021

Hi @junrushao1994 @tkonolige @comaniac @altanh I've tried several ways to remove the nn.dense, but it results on the modifications among nearly 70 files and some strange CI errors.

Currently I think its still better to keep the nn.dense and nn.matmul. We can gradually remove the use of nn.dense in the future and finally deprecate it.

Also cc @tqchen @FrozenGene

@jcf94 jcf94 requested a review from comaniac June 26, 2021 03:23
@jcf94 jcf94 changed the title [WIP][Matmul] Add matmul op [Matmul] Add matmul op Jun 26, 2021
python/tvm/relay/frontend/tensorflow.py Outdated Show resolved Hide resolved
@@ -1230,6 +1239,9 @@ def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None):
params : dict of str to tvm.nd.NDArray
Dict of converted parameters stored in tvm.nd.NDArray format
"""
global _USE_DENSE_INSTEAD_OF_MATMUL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is confusing too. If _USE_DENSE_INSTEAD_OF_MATMUL is not supposed to be changed by users directly, we should improve the comments of this global variable. Please see my comment at the global variable.

btw, in this case we can simply _USE_DENSE_INSTEAD_OF_MATMUL = use_dense_op without checking if they are the same or not.

python/tvm/relay/op/nn/_nn.py Outdated Show resolved Hide resolved
python/tvm/relay/op/nn/nn.py Outdated Show resolved Hide resolved
python/tvm/relay/op/strategy/cuda.py Outdated Show resolved Hide resolved
python/tvm/topi/x86/dense.py Outdated Show resolved Hide resolved
python/tvm/topi/x86/dense.py Outdated Show resolved Hide resolved
python/tvm/topi/x86/dense.py Outdated Show resolved Hide resolved
python/tvm/topi/x86/dense.py Outdated Show resolved Hide resolved
python/tvm/topi/x86/dense.py Outdated Show resolved Hide resolved
include/tvm/relay/attrs/nn.h Show resolved Hide resolved
@@ -1204,7 +1208,7 @@ def from_tensorflow(self, graph, layout="NHWC", shape=None, outputs=None):
return func, self._params


def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None):
def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None, use_dense_op=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should have a flag here. We should just commit to one codepath.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that we're not able to remove all the nn.dense at this moment and there's not enough AutoTVM template for nn.matmul.

So the use of nn.matmul can only be seen as a experimental feature. We should not change the default behavior in case this may affect those who are using nn.dense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use the dense schedules when A_transpose=false and B_transpose=true. Then we can convert all nn.dense to nn.matmul.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR already uses dense schedule for matmul_nt in the case of lowering to TOPI. On the other hand, as @jcf94 mentioned in the PR comment, doing so will affect much more places in the codebase and we better gradually convert them instead of in a single PR. It sounds reasonable to me.

def matmul_grad(orig, grad):
"""Returns [grad' @ weight, data @ grad']"""
data, weight = orig.args
if (orig.attrs["data_transposed"], orig.attrs["weight_transposed"]) == (True, True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refactor this to not if/else on every possible combination of transpose.

Comment on lines 1492 to 1493
units : int, optional
Number of hidden units of the matmul transformation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a unit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the doc has explained enough: "The hidden units." This is copied from the original nn.dense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is clear at all. Is the hidden units the inner dimension of the matmul?

----------
data : tvm.relay.Expr
The input data to the operator,
of shape `(d_1, d_2, ..., d_n, units_in)` or `(d_1, d_2, ..., units_in, d_n)`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't both input shapes by dimension 2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the input of matmul is supposed to be a multiple-dim tensor(not limited to 2). This is copied from the original nn.dense.

Other frameworks like Pytorch also has such definition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the definition of the computation above to reflect these shapes then?

Comment on lines 1193 to 1195
if data_transposed:
out[out.shape[0] - 2] = out[out.shape[0] - 1]
out[out.shape[0] - 1] = weight_shape[0] if weight_transposed else weight_shape[1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems really complicated. Shouldn't it just be some part of data_shape and weight_shape depending on the transposes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the dimension of data tensor can be more than 2, this is the simplest implementation to do so.

python/tvm/topi/nn/dense.py Outdated Show resolved Hide resolved
@jcf94
Copy link
Contributor Author

jcf94 commented Jun 29, 2021

Thanks! @comaniac @tkonolige Most comments have been addressed.

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I don't have other comments but just a nit.

include/tvm/relay/attrs/nn.h Show resolved Hide resolved
@@ -1204,7 +1208,7 @@ def from_tensorflow(self, graph, layout="NHWC", shape=None, outputs=None):
return func, self._params


def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None):
def from_tensorflow(graph, layout="NHWC", shape=None, outputs=None, use_dense_op=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR already uses dense schedule for matmul_nt in the case of lowering to TOPI. On the other hand, as @jcf94 mentioned in the PR comment, doing so will affect much more places in the codebase and we better gradually convert them instead of in a single PR. It sounds reasonable to me.

python/tvm/relay/op/strategy/x86.py Show resolved Hide resolved
@jcf94
Copy link
Contributor Author

jcf94 commented Jun 30, 2021

Thanks! @comaniac @tkonolige
I'll add a topi compute support for multi-dim tensor input in another PR.

@jcf94 jcf94 merged commit 6d1ced0 into apache:main Jun 30, 2021
lygztq pushed a commit to lygztq/tvm that referenced this pull request Jul 1, 2021
* Add Matmul Op

* Recover DenseAttrs

* Add grad for matmul & some update

* Update matmul cuda default schedule

* Add blas support for matmul

* Lint fix add update doc strings
@tqchen
Copy link
Member

tqchen commented Jul 3, 2021

@jcf94 There a few regressions in PyTorch frontend and other places that might be related, can you take a look? dense should map to matmul(A, B.T).

ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
* Add Matmul Op

* Recover DenseAttrs

* Add grad for matmul & some update

* Update matmul cuda default schedule

* Add blas support for matmul

* Lint fix add update doc strings
zxy844288792 pushed a commit to zxy844288792/tvm that referenced this pull request Mar 4, 2022
* Add Matmul Op

* Recover DenseAttrs

* Add grad for matmul & some update

* Update matmul cuda default schedule

* Add blas support for matmul

* Lint fix add update doc strings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants