Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Codegen][Metal] Disable cross-function call in Metal codegen #16033

Merged
merged 2 commits into from
Nov 3, 2023

Conversation

MasterJH5574
Copy link
Contributor

This PR restores the Metal codegen to the one before #15835. Due to there will likely be no internal function call in Metal, we think it is safe to do so.

Verified that with this PR, the metal codegen and iPhone codegen will not fail and will work properly.

The reason of the iPhone codegen failure is because the multiple declarations of a same function will lead to multiple emissions of a same structs, which is not recognizable by the metal compiler.

@MasterJH5574
Copy link
Contributor Author

cc @Lunderberg @tqchen

@junrushao
Copy link
Member

CC @Lunderberg would you mind taking a look? BTW, is there a way to unit-test it as you enabled a while ago?

@tqchen
Copy link
Member

tqchen commented Nov 1, 2023

Thanks @MasterJH5574 , can you also document the rationale of that we specialize this part. aka we do not have multiple cross function calls, so we can keep the delcaration once to keep things simple

@MasterJH5574
Copy link
Contributor Author

BTW, is there a way to unit-test it as you enabled a while ago?

Reading the failure reason elaborated here #15835 (comment), I feel the way to reproduce or test the issue is to

  • construct an IRModule where one PrimFunc containing trailing int/float parameters is called at least twice,
  • build the IRModule to iphone target.

@Lunderberg
Copy link
Contributor

If possible, I'd like to fix the functionality and not just revert it. #14985 and #14862 have been on the back burner for quite a while, mainly due to breakage they cause in tvm.contrib.ethosu, but are still changes I'd like to get around to landing at some point.

@MasterJH5574 We should be able to test it in CI, since I added compile-only testing of the metal backend in #15756. Functionality that isn't tested in CI is inherently unstable, and so CI testing is preferable in addition to local testing.

@Lunderberg
Copy link
Contributor

BTW, is there a way to unit-test it as you enabled a while ago?

The two steps to add a Metal specific unit test are to (1) decorate the test with @tvm.testing.requires_metal(support_required="compile-only") and (2) add the test in the Minimal Metal Compile-Only section of .github/workflows/main.yml. This only validates the compiler support, as we don't have a CI environment that has Metal GPUs, but that should catch the majority of codegen failure modes.

@MasterJH5574
Copy link
Contributor Author

@Lunderberg I agree that it would be better if the issue can be fixed! For this PR, the overall codegen still bases on the updated codegen flow (we did not revert the previous PR). There is no inter function calls between kernels in Metal, so it is okay to keep the former flow in Metal codegen for now as a swift fix to the codegen issue.

And thanks for the pointer on testing. I will add a test soon.

@Lunderberg
Copy link
Contributor

Lunderberg commented Nov 1, 2023

There aren't currently any inter-function calls in Metal, but it is part of my planned implementation for #14985. Breaking a change apart into the underlying infrastructure changes (e.g. codegen support) and the usage of that infrastructure allows the changes to be tested and integrated separately.

@Lunderberg
Copy link
Contributor

That said, I have an idea for how to work around the lack of inter-function calls in codegen. Since we can represent inter-function calls in TIR, we could add a InlineLocalFunctions pass that removes any internal private functions that result from the single-module lowering flow.

@tqchen
Copy link
Member

tqchen commented Nov 1, 2023

I like the idea of keeping things simple on specific shader backend like metal, inlining sounds like a good approach

@MasterJH5574 MasterJH5574 force-pushed the tvm-dev/2023-11-01-metal-codegen branch from 5b4b175 to d374a05 Compare November 1, 2023 21:23
@MasterJH5574
Copy link
Contributor Author

Finished adding a test and included it in “minimal Metal compile-only.” Also added a note in AddFunction in codegen_metal.cc

Copy link
Contributor

@Lunderberg Lunderberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the special packing used for the scalar buffers, and that there's a possibility of a workaround for the limitation in Metal, I think these sections makes sense to revert.

In the void CodeGenMetal::VisitExpr_(const CallNode* op) override, can we add an explicit check for the inter-function call? Otherwise, this falls back to the CodegenC default that support it.

CHECK(!op->op.as<GlobalVarNode>())
    << "CodegenMetal does not support inter-function calls, "
    << "but expression " << GetRef<Expr>(op)
    << " calls PrimFunc " << op->op;

This PR restores the Metal codegen to the one before apache#15835.
Due to there will likely be no internal function call in Metal,
we think it is safe to do so.

Verified that with this PR, the metal codegen and iPhone codegen
will not fail and will work properly.

The reason of the iPhone codegen failure is because the multiple
declarations of a same function will lead to multiple emissions
of a same structs, which is not recognizable by the metal compiler.
@MasterJH5574 MasterJH5574 force-pushed the tvm-dev/2023-11-01-metal-codegen branch from d374a05 to b95db83 Compare November 1, 2023 22:51
@MasterJH5574
Copy link
Contributor Author

Thanks @Lunderberg for the suggestions! Added the check in CallNode visit function.

Copy link
Contributor

@Lunderberg Lunderberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, and looks good to me! The unit test looks really good; I especially like the explanation of the known portions of the failure mode.

@tqchen tqchen merged commit 043f147 into apache:main Nov 3, 2023
17 checks passed
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Nov 29, 2023
The functionality to express a call from one `PrimFunc` to another was
introduced in apache#14889.  While this
was initially planned to be supported at codegen for all targets (see
apache#15835), this resulted in breakage
on some backends (see apache#16033).
After discussion, the plan was changed to support TIR inlining, which
would enable the same high-level functionality in TIR without
requiring immediate low-level support across all codegens.

This commit implements and tests a new IRModule transform
`InlinePrivateFunctions`, which can be used as part of lowering in a
follow-up commit.

Because this is initially implemented for use quite late in the
lowering flow, many constructs are not currently supported.  The
current implementation has the following restrictions.

* `tir::Block` nodes may not occur in the inlined function.  Because a
  subroutine may be called multiple times, inlining of a subroutine
  that contains `tir::Block` would result in non-unique names.
  Support of subroutines with `tir::Block` instances will require
  de-duplication of block names.

* The subroutine's callsite must occur within a `tir::Evaluate` block.
  Because inlining a subroutine inserts the `tir::Stmt` body at the
  point of use, replacement must occur in a context where a
  `tir::Stmt` can be returned.  Support of subroutines that are called
  within an expression (e.g. Replacing `func` in `Buf[0] = func(1) +
  func(2)`) would require hoisting preprocessing done in the
  subroutine to the parent `tir::Stmt`.

* The subroutine may only accept primitive arguments, and must have an
  empty `buffer_map`.  Support of subroutines that are called with
  `tir::Buffer` or `tir::BufferRegion` arguments would require a way
  to represent these arguments at the callsite, and substitution of
  the buffer into the callee.

If these unsupported constructs are used, then the inlining does is
skipped.  This commit includes unit tests for these unsupported
constructs, to validate that `InlinePrivateFunctions` produces
well-formed output even when they are present.
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Nov 29, 2023
The functionality to express a call from one `PrimFunc` to another was
introduced in apache#14889.  While this
was initially planned to be supported at codegen for all targets (see
apache#15835), this resulted in breakage
on some backends (see apache#16033).
After discussion, the plan was changed to support TIR inlining, which
would enable the same high-level functionality in TIR without
requiring immediate low-level support across all codegens.

This commit implements and tests a new IRModule transform
`InlinePrivateFunctions`, which can be used as part of lowering in a
follow-up commit.

Because this is initially implemented for use quite late in the
lowering flow, many constructs are not currently supported.  The
current implementation has the following restrictions.

* `tir::Block` nodes may not occur in the inlined function.  Because a
  subroutine may be called multiple times, inlining of a subroutine
  that contains `tir::Block` would result in non-unique names.
  Support of subroutines with `tir::Block` instances will require
  de-duplication of block names.

* The subroutine's callsite must occur within a `tir::Evaluate` block.
  Because inlining a subroutine inserts the `tir::Stmt` body at the
  point of use, replacement must occur in a context where a
  `tir::Stmt` can be returned.  Support of subroutines that are called
  within an expression (e.g. Replacing `func` in `Buf[0] = func(1) +
  func(2)`) would require hoisting preprocessing done in the
  subroutine to the parent `tir::Stmt`.

* The subroutine may only accept primitive arguments, and must have an
  empty `buffer_map`.  Support of subroutines that are called with
  `tir::Buffer` or `tir::BufferRegion` arguments would require a way
  to represent these arguments at the callsite, and substitution of
  the buffer into the callee.

If these unsupported constructs are used, then the inlining of those
functions is skipped.  This commit includes unit tests for these
unsupported constructs, to validate that `InlinePrivateFunctions`
produces well-formed output even when they are present.
@Lunderberg
Copy link
Contributor

Circling back to this, I have PR #16184 submitted, which adds a TIR transform that can inline private function calls. That way, the user-facing functionality of TIR subroutines can be used, without requiring every codegen to natively support subroutines.

Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Dec 19, 2023
The functionality to express a call from one `PrimFunc` to another was
introduced in apache#14889.  While this
was initially planned to be supported at codegen for all targets (see
apache#15835), this resulted in breakage
on some backends (see apache#16033).
After discussion, the plan was changed to support TIR inlining, which
would enable the same high-level functionality in TIR without
requiring immediate low-level support across all codegens.

This commit implements and tests a new IRModule transform
`InlinePrivateFunctions`, which can be used as part of lowering in a
follow-up commit.

Because this is initially implemented for use quite late in the
lowering flow, many constructs are not currently supported.  The
current implementation has the following restrictions.

* `tir::Block` nodes may not occur in the inlined function.  Because a
  subroutine may be called multiple times, inlining of a subroutine
  that contains `tir::Block` would result in non-unique names.
  Support of subroutines with `tir::Block` instances will require
  de-duplication of block names.

* The subroutine's callsite must occur within a `tir::Evaluate` block.
  Because inlining a subroutine inserts the `tir::Stmt` body at the
  point of use, replacement must occur in a context where a
  `tir::Stmt` can be returned.  Support of subroutines that are called
  within an expression (e.g. Replacing `func` in `Buf[0] = func(1) +
  func(2)`) would require hoisting preprocessing done in the
  subroutine to the parent `tir::Stmt`.

* The subroutine may only accept primitive arguments, and must have an
  empty `buffer_map`.  Support of subroutines that are called with
  `tir::Buffer` or `tir::BufferRegion` arguments would require a way
  to represent these arguments at the callsite, and substitution of
  the buffer into the callee.

If these unsupported constructs are used, then the inlining of those
functions is skipped.  This commit includes unit tests for these
unsupported constructs, to validate that `InlinePrivateFunctions`
produces well-formed output even when they are present.
Lunderberg added a commit that referenced this pull request Jan 3, 2024
* [TIR] Update DeclBuffer nodes when specializing PrimFunc

Prior to this commit, a buffer whose parameters (e.g. shape/stride)
contained a specialized parameter would not be updated when appearing
in a `DeclBuffer` node.  This commit updates the `Specialize` function
to update buffers that occur in `DeclBuffer` nodes.

* [TIR] Handle specialization that remaps a buffer var

* [TIR] Handle specialization of buffer variable to PrimExpr

* [TIR][Transform] Implement InlinePrivateFunctions

The functionality to express a call from one `PrimFunc` to another was
introduced in #14889.  While this
was initially planned to be supported at codegen for all targets (see
#15835), this resulted in breakage
on some backends (see #16033).
After discussion, the plan was changed to support TIR inlining, which
would enable the same high-level functionality in TIR without
requiring immediate low-level support across all codegens.

This commit implements and tests a new IRModule transform
`InlinePrivateFunctions`, which can be used as part of lowering in a
follow-up commit.

Because this is initially implemented for use quite late in the
lowering flow, many constructs are not currently supported.  The
current implementation has the following restrictions.

* `tir::Block` nodes may not occur in the inlined function.  Because a
  subroutine may be called multiple times, inlining of a subroutine
  that contains `tir::Block` would result in non-unique names.
  Support of subroutines with `tir::Block` instances will require
  de-duplication of block names.

* The subroutine's callsite must occur within a `tir::Evaluate` block.
  Because inlining a subroutine inserts the `tir::Stmt` body at the
  point of use, replacement must occur in a context where a
  `tir::Stmt` can be returned.  Support of subroutines that are called
  within an expression (e.g. Replacing `func` in `Buf[0] = func(1) +
  func(2)`) would require hoisting preprocessing done in the
  subroutine to the parent `tir::Stmt`.

* The subroutine may only accept primitive arguments, and must have an
  empty `buffer_map`.  Support of subroutines that are called with
  `tir::Buffer` or `tir::BufferRegion` arguments would require a way
  to represent these arguments at the callsite, and substitution of
  the buffer into the callee.

If these unsupported constructs are used, then the inlining of those
functions is skipped.  This commit includes unit tests for these
unsupported constructs, to validate that `InlinePrivateFunctions`
produces well-formed output even when they are present.

* Updates based on review comments

* ci bump

* CI bump
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants