Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TIR][Transform] Implement InlinePrivateFunctions #16184

Merged
merged 7 commits into from
Jan 3, 2024

Conversation

Lunderberg
Copy link
Contributor

The functionality to express a call from one PrimFunc to another was introduced in #14889. While this was initially planned to be supported at codegen for all targets (see #15835), this resulted in breakage on some backends (see #16033). After discussion, the plan was changed to support TIR inlining, which would enable the same high-level functionality in TIR without requiring immediate low-level support across all codegens.

This commit implements and tests a new IRModule transform InlinePrivateFunctions, which can be used as part of lowering in a follow-up commit.

Because this is initially implemented for use quite late in the lowering flow, many constructs are not currently supported. The current implementation has the following restrictions.

  • tir::Block nodes may not occur in the inlined function. Because a subroutine may be called multiple times, inlining of a subroutine that contains tir::Block would result in non-unique names. Support of subroutines with tir::Block instances will require de-duplication of block names.

  • The subroutine's callsite must occur within a tir::Evaluate block. Because inlining a subroutine inserts the tir::Stmt body at the point of use, replacement must occur in a context where a tir::Stmt can be returned. Support of subroutines that are called within an expression (e.g. Replacing func in Buf[0] = func(1) + func(2)) would require hoisting preprocessing done in the subroutine to the parent tir::Stmt.

  • The subroutine may only accept primitive arguments, and must have an empty buffer_map. Support of subroutines that are called with tir::Buffer or tir::BufferRegion arguments would require a way to represent these arguments at the callsite, and substitution of the buffer into the callee.

If these unsupported constructs are used, then the inlining of those functions is skipped. This commit includes unit tests for these unsupported constructs, to validate that InlinePrivateFunctions produces well-formed output even when they are present.

Prior to this commit, a buffer whose parameters (e.g. shape/stride)
contained a specialized parameter would not be updated when appearing
in a `DeclBuffer` node.  This commit updates the `Specialize` function
to update buffers that occur in `DeclBuffer` nodes.
The functionality to express a call from one `PrimFunc` to another was
introduced in apache#14889.  While this
was initially planned to be supported at codegen for all targets (see
apache#15835), this resulted in breakage
on some backends (see apache#16033).
After discussion, the plan was changed to support TIR inlining, which
would enable the same high-level functionality in TIR without
requiring immediate low-level support across all codegens.

This commit implements and tests a new IRModule transform
`InlinePrivateFunctions`, which can be used as part of lowering in a
follow-up commit.

Because this is initially implemented for use quite late in the
lowering flow, many constructs are not currently supported.  The
current implementation has the following restrictions.

* `tir::Block` nodes may not occur in the inlined function.  Because a
  subroutine may be called multiple times, inlining of a subroutine
  that contains `tir::Block` would result in non-unique names.
  Support of subroutines with `tir::Block` instances will require
  de-duplication of block names.

* The subroutine's callsite must occur within a `tir::Evaluate` block.
  Because inlining a subroutine inserts the `tir::Stmt` body at the
  point of use, replacement must occur in a context where a
  `tir::Stmt` can be returned.  Support of subroutines that are called
  within an expression (e.g. Replacing `func` in `Buf[0] = func(1) +
  func(2)`) would require hoisting preprocessing done in the
  subroutine to the parent `tir::Stmt`.

* The subroutine may only accept primitive arguments, and must have an
  empty `buffer_map`.  Support of subroutines that are called with
  `tir::Buffer` or `tir::BufferRegion` arguments would require a way
  to represent these arguments at the callsite, and substitution of
  the buffer into the callee.

If these unsupported constructs are used, then the inlining of those
functions is skipped.  This commit includes unit tests for these
unsupported constructs, to validate that `InlinePrivateFunctions`
produces well-formed output even when they are present.
@Lunderberg
Copy link
Contributor Author

Rebased onto main as the CI results were a bit stale.

Copy link
Contributor

@slyubomirsky slyubomirsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not especially familiar with the TIR code base, but the logic here seems reasonable. Is there a reason the issue of unique names for blocks is a blocker?

Array<Buffer> alloc_buffers = op->alloc_buffers.Map(
std::bind(&PrimFuncSpecializer::MutateAllocBuffer, this, std::placeholders::_1));
Array<Buffer> alloc_buffers =
op->alloc_buffers.Map([this](const auto& buf) { return MutateAllocBuffer(buf); });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much cleaner this way :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I always need to pause when encountering std::placeholders, and try to replace it when reasonable to do so.

node.CopyOnWrite()->buffer = new_buf;
}

// If the buffer variable is begin remapped to an expression, we
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If the buffer variable is begin remapped to an expression, we
// If the buffer variable is being remapped to an expression, we

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, and fixed.


Map<GlobalVar, PrimFunc> output;
for (const auto& [gvar, base_func] : mod->functions) {
if (auto opt = base_func.as<PrimFunc>()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it's entirely necessary, but you can reduce nesting with a construction like

if (!base_func.as<PrimFunc>()) { continue; }
auto prim_func = Downcast<PrimFunc>(base_func);
// ...

I'm a fan of reducing nesting when possible, but that's up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to avoid using continue, as the lack of nesting makes it harder to track when the flow control changes. If the nesting gets to be too much, I tend to switch to a subroutine with early return. The early return can mimic any flow control that continue could have, but the restricted context available in the subroutine keeps it manageable. Looking at this case again, I think it would be better to pull these out into an bool IsInlinablePrimFunc(const PrimFunc& func, PSet<GlobalVar>& recursive_functions) subroutine, and will update to do so.

Stmt VisitStmt_(const EvaluateNode* eval) override {
if (auto call = eval->value.as<CallNode>()) {
if (auto gvar = call->op.as<GlobalVar>()) {
if (auto opt_callee = inlinable_funcs_.Get(gvar.value())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perhaps a place where reducing nesting might improve readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the nesting is a bit deep here. Reordered to instead call a GetInlinedSubroutine method, and let me know what you think on it.

PSet<GlobalVar> GetRemovableFunctions() const { return removable_funcs_; }

private:
Stmt VisitStmt_(const EvaluateNode* eval) override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to mention the details from the PR description as for why cases other than EvaluateNode are not handled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, and I've added the details in a comment, along with pointing to the xfail test case.

@Lunderberg
Copy link
Contributor Author

I am not especially familiar with the TIR code base, but the logic here seems reasonable. Is there a reason the issue of unique names for blocks is a blocker?

Not a strong blocker, just an unsupported case at the moment. Something that can definitely be extended in the future, but not something required for use late in the lowering pipeline.

Copy link
Contributor

@slyubomirsky slyubomirsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my suggestions.

@Lunderberg
Copy link
Contributor Author

Doing one last CI re-run before merging. I don't expect there to be breaking changes introduced over Christmas, but I try to avoid stale CI results either way.

@Lunderberg
Copy link
Contributor Author

CI is passing, except for a flaky unit test. I've submitted #16337 to disable the flaky unit test. Re-running the CI to see if I can hit the 2/3 chance of passing the flaky test while I wait on it.

@Lunderberg Lunderberg merged commit 8eec0bf into apache:main Jan 3, 2024
17 checks passed
@Lunderberg Lunderberg deleted the tir_inline_private_functions branch January 3, 2024 14:00
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Jan 3, 2024
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants