[TIR][Transform] Implement InlinePrivateFunctions #16184

Lunderberg · 2023-11-29T19:54:41Z

The functionality to express a call from one PrimFunc to another was introduced in #14889. While this was initially planned to be supported at codegen for all targets (see #15835), this resulted in breakage on some backends (see #16033). After discussion, the plan was changed to support TIR inlining, which would enable the same high-level functionality in TIR without requiring immediate low-level support across all codegens.

This commit implements and tests a new IRModule transform InlinePrivateFunctions, which can be used as part of lowering in a follow-up commit.

Because this is initially implemented for use quite late in the lowering flow, many constructs are not currently supported. The current implementation has the following restrictions.

tir::Block nodes may not occur in the inlined function. Because a subroutine may be called multiple times, inlining of a subroutine that contains tir::Block would result in non-unique names. Support of subroutines with tir::Block instances will require de-duplication of block names.
The subroutine's callsite must occur within a tir::Evaluate block. Because inlining a subroutine inserts the tir::Stmt body at the point of use, replacement must occur in a context where a tir::Stmt can be returned. Support of subroutines that are called within an expression (e.g. Replacing func in Buf[0] = func(1) + func(2)) would require hoisting preprocessing done in the subroutine to the parent tir::Stmt.
The subroutine may only accept primitive arguments, and must have an empty buffer_map. Support of subroutines that are called with tir::Buffer or tir::BufferRegion arguments would require a way to represent these arguments at the callsite, and substitution of the buffer into the callee.

If these unsupported constructs are used, then the inlining of those functions is skipped. This commit includes unit tests for these unsupported constructs, to validate that InlinePrivateFunctions produces well-formed output even when they are present.

Prior to this commit, a buffer whose parameters (e.g. shape/stride) contained a specialized parameter would not be updated when appearing in a `DeclBuffer` node. This commit updates the `Specialize` function to update buffers that occur in `DeclBuffer` nodes.

The functionality to express a call from one `PrimFunc` to another was introduced in apache#14889. While this was initially planned to be supported at codegen for all targets (see apache#15835), this resulted in breakage on some backends (see apache#16033). After discussion, the plan was changed to support TIR inlining, which would enable the same high-level functionality in TIR without requiring immediate low-level support across all codegens. This commit implements and tests a new IRModule transform `InlinePrivateFunctions`, which can be used as part of lowering in a follow-up commit. Because this is initially implemented for use quite late in the lowering flow, many constructs are not currently supported. The current implementation has the following restrictions. * `tir::Block` nodes may not occur in the inlined function. Because a subroutine may be called multiple times, inlining of a subroutine that contains `tir::Block` would result in non-unique names. Support of subroutines with `tir::Block` instances will require de-duplication of block names. * The subroutine's callsite must occur within a `tir::Evaluate` block. Because inlining a subroutine inserts the `tir::Stmt` body at the point of use, replacement must occur in a context where a `tir::Stmt` can be returned. Support of subroutines that are called within an expression (e.g. Replacing `func` in `Buf[0] = func(1) + func(2)`) would require hoisting preprocessing done in the subroutine to the parent `tir::Stmt`. * The subroutine may only accept primitive arguments, and must have an empty `buffer_map`. Support of subroutines that are called with `tir::Buffer` or `tir::BufferRegion` arguments would require a way to represent these arguments at the callsite, and substitution of the buffer into the callee. If these unsupported constructs are used, then the inlining of those functions is skipped. This commit includes unit tests for these unsupported constructs, to validate that `InlinePrivateFunctions` produces well-formed output even when they are present.

Lunderberg · 2023-12-19T18:46:41Z

Rebased onto main as the CI results were a bit stale.

slyubomirsky

I am not especially familiar with the TIR code base, but the logic here seems reasonable. Is there a reason the issue of unique names for blocks is a blocker?

slyubomirsky · 2023-12-19T20:21:39Z

src/tir/ir/specialize.cc

-    Array<Buffer> alloc_buffers = op->alloc_buffers.Map(
-        std::bind(&PrimFuncSpecializer::MutateAllocBuffer, this, std::placeholders::_1));
+    Array<Buffer> alloc_buffers =
+        op->alloc_buffers.Map([this](const auto& buf) { return MutateAllocBuffer(buf); });


Much cleaner this way :)

Agreed. I always need to pause when encountering std::placeholders, and try to replace it when reasonable to do so.

slyubomirsky · 2023-12-19T20:22:06Z

src/tir/ir/specialize.cc

+      node.CopyOnWrite()->buffer = new_buf;
+    }
+
+    // If the buffer variable is begin remapped to an expression, we


Suggested change

// If the buffer variable is begin remapped to an expression, we

// If the buffer variable is being remapped to an expression, we

Thank you, and fixed.

src/tir/transforms/inline_private_functions.cc

slyubomirsky · 2023-12-19T20:33:31Z

src/tir/transforms/inline_private_functions.cc

+
+  Map<GlobalVar, PrimFunc> output;
+  for (const auto& [gvar, base_func] : mod->functions) {
+    if (auto opt = base_func.as<PrimFunc>()) {


Not sure it's entirely necessary, but you can reduce nesting with a construction like

if (!base_func.as<PrimFunc>()) { continue; } auto prim_func = Downcast<PrimFunc>(base_func); // ...

I'm a fan of reducing nesting when possible, but that's up to you.

I try to avoid using continue, as the lack of nesting makes it harder to track when the flow control changes. If the nesting gets to be too much, I tend to switch to a subroutine with early return. The early return can mimic any flow control that continue could have, but the restricted context available in the subroutine keeps it manageable. Looking at this case again, I think it would be better to pull these out into an bool IsInlinablePrimFunc(const PrimFunc& func, PSet<GlobalVar>& recursive_functions) subroutine, and will update to do so.

slyubomirsky · 2023-12-19T20:35:12Z

src/tir/transforms/inline_private_functions.cc

+  Stmt VisitStmt_(const EvaluateNode* eval) override {
+    if (auto call = eval->value.as<CallNode>()) {
+      if (auto gvar = call->op.as<GlobalVar>()) {
+        if (auto opt_callee = inlinable_funcs_.Get(gvar.value())) {


This is perhaps a place where reducing nesting might improve readability.

Yeah, the nesting is a bit deep here. Reordered to instead call a GetInlinedSubroutine method, and let me know what you think on it.

slyubomirsky · 2023-12-19T20:46:52Z

src/tir/transforms/inline_private_functions.cc

+  PSet<GlobalVar> GetRemovableFunctions() const { return removable_funcs_; }
+
+ private:
+  Stmt VisitStmt_(const EvaluateNode* eval) override {


It might be good to mention the details from the PR description as for why cases other than EvaluateNode are not handled.

Good call, and I've added the details in a comment, along with pointing to the xfail test case.

Lunderberg · 2023-12-20T02:10:18Z

I am not especially familiar with the TIR code base, but the logic here seems reasonable. Is there a reason the issue of unique names for blocks is a blocker?

Not a strong blocker, just an unsupported case at the moment. Something that can definitely be extended in the future, but not something required for use late in the lowering pipeline.

slyubomirsky

Thanks for addressing my suggestions.

Lunderberg · 2023-12-29T16:11:33Z

Doing one last CI re-run before merging. I don't expect there to be breaking changes introduced over Christmas, but I try to avoid stale CI results either way.

Lunderberg · 2024-01-02T20:25:55Z

CI is passing, except for a flaky unit test. I've submitted #16337 to disable the flaky unit test. Re-running the CI to see if I can hit the 2/3 chance of passing the flaky test while I wait on it.

Lunderberg mentioned this pull request Nov 29, 2023

[Codegen][Metal] Disable cross-function call in Metal codegen #16033

Merged

Lunderberg added 4 commits December 19, 2023 12:44

[TIR] Handle specialization that remaps a buffer var

eb2a853

[TIR] Handle specialization of buffer variable to PrimExpr

b484142

Lunderberg force-pushed the tir_inline_private_functions branch from 0c8a81d to f49b1f8 Compare December 19, 2023 18:45

slyubomirsky reviewed Dec 19, 2023

View reviewed changes

Updates based on review comments

4d74f52

slyubomirsky approved these changes Dec 20, 2023

View reviewed changes

ci bump

4ca4c11

CI bump

efe7bbd

Lunderberg merged commit 8eec0bf into apache:main Jan 3, 2024
17 checks passed

Lunderberg deleted the tir_inline_private_functions branch January 3, 2024 14:00

Lunderberg mentioned this pull request Jan 3, 2024

[Driver] Single-module lowering flow in driver_api.cc #14985

Open

Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Jan 3, 2024

Update to compile correctly with changes made in apache#16184

8c0a92d

Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Feb 8, 2024

Update to compile correctly with changes made in apache#16184

5a9b6ab

ysh329 mentioned this pull request Apr 21, 2024

[Release] v0.16.0 Release Candidate Notes #16911

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TIR][Transform] Implement InlinePrivateFunctions #16184

[TIR][Transform] Implement InlinePrivateFunctions #16184

Lunderberg commented Nov 29, 2023

Lunderberg commented Dec 19, 2023

slyubomirsky left a comment

slyubomirsky Dec 19, 2023

Lunderberg Dec 20, 2023

slyubomirsky Dec 19, 2023

Lunderberg Dec 20, 2023

slyubomirsky Dec 19, 2023

Lunderberg Dec 20, 2023

slyubomirsky Dec 19, 2023

Lunderberg Dec 20, 2023

slyubomirsky Dec 19, 2023

Lunderberg Dec 20, 2023

Lunderberg commented Dec 20, 2023

slyubomirsky left a comment

Lunderberg commented Dec 29, 2023

Lunderberg commented Jan 2, 2024

	// If the buffer variable is begin remapped to an expression, we
	// If the buffer variable is being remapped to an expression, we

[TIR][Transform] Implement InlinePrivateFunctions #16184

[TIR][Transform] Implement InlinePrivateFunctions #16184

Conversation

Lunderberg commented Nov 29, 2023

Lunderberg commented Dec 19, 2023

slyubomirsky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Lunderberg commented Dec 20, 2023

slyubomirsky left a comment

Choose a reason for hiding this comment

Lunderberg commented Dec 29, 2023

Lunderberg commented Jan 2, 2024