-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TIR][Transform] Implement InlinePrivateFunctions #16184
[TIR][Transform] Implement InlinePrivateFunctions #16184
Conversation
Prior to this commit, a buffer whose parameters (e.g. shape/stride) contained a specialized parameter would not be updated when appearing in a `DeclBuffer` node. This commit updates the `Specialize` function to update buffers that occur in `DeclBuffer` nodes.
The functionality to express a call from one `PrimFunc` to another was introduced in apache#14889. While this was initially planned to be supported at codegen for all targets (see apache#15835), this resulted in breakage on some backends (see apache#16033). After discussion, the plan was changed to support TIR inlining, which would enable the same high-level functionality in TIR without requiring immediate low-level support across all codegens. This commit implements and tests a new IRModule transform `InlinePrivateFunctions`, which can be used as part of lowering in a follow-up commit. Because this is initially implemented for use quite late in the lowering flow, many constructs are not currently supported. The current implementation has the following restrictions. * `tir::Block` nodes may not occur in the inlined function. Because a subroutine may be called multiple times, inlining of a subroutine that contains `tir::Block` would result in non-unique names. Support of subroutines with `tir::Block` instances will require de-duplication of block names. * The subroutine's callsite must occur within a `tir::Evaluate` block. Because inlining a subroutine inserts the `tir::Stmt` body at the point of use, replacement must occur in a context where a `tir::Stmt` can be returned. Support of subroutines that are called within an expression (e.g. Replacing `func` in `Buf[0] = func(1) + func(2)`) would require hoisting preprocessing done in the subroutine to the parent `tir::Stmt`. * The subroutine may only accept primitive arguments, and must have an empty `buffer_map`. Support of subroutines that are called with `tir::Buffer` or `tir::BufferRegion` arguments would require a way to represent these arguments at the callsite, and substitution of the buffer into the callee. If these unsupported constructs are used, then the inlining of those functions is skipped. This commit includes unit tests for these unsupported constructs, to validate that `InlinePrivateFunctions` produces well-formed output even when they are present.
0c8a81d
to
f49b1f8
Compare
Rebased onto main as the CI results were a bit stale. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not especially familiar with the TIR code base, but the logic here seems reasonable. Is there a reason the issue of unique names for blocks is a blocker?
Array<Buffer> alloc_buffers = op->alloc_buffers.Map( | ||
std::bind(&PrimFuncSpecializer::MutateAllocBuffer, this, std::placeholders::_1)); | ||
Array<Buffer> alloc_buffers = | ||
op->alloc_buffers.Map([this](const auto& buf) { return MutateAllocBuffer(buf); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much cleaner this way :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I always need to pause when encountering std::placeholders
, and try to replace it when reasonable to do so.
src/tir/ir/specialize.cc
Outdated
node.CopyOnWrite()->buffer = new_buf; | ||
} | ||
|
||
// If the buffer variable is begin remapped to an expression, we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If the buffer variable is begin remapped to an expression, we | |
// If the buffer variable is being remapped to an expression, we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, and fixed.
|
||
Map<GlobalVar, PrimFunc> output; | ||
for (const auto& [gvar, base_func] : mod->functions) { | ||
if (auto opt = base_func.as<PrimFunc>()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it's entirely necessary, but you can reduce nesting with a construction like
if (!base_func.as<PrimFunc>()) { continue; }
auto prim_func = Downcast<PrimFunc>(base_func);
// ...
I'm a fan of reducing nesting when possible, but that's up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try to avoid using continue
, as the lack of nesting makes it harder to track when the flow control changes. If the nesting gets to be too much, I tend to switch to a subroutine with early return. The early return can mimic any flow control that continue
could have, but the restricted context available in the subroutine keeps it manageable. Looking at this case again, I think it would be better to pull these out into an bool IsInlinablePrimFunc(const PrimFunc& func, PSet<GlobalVar>& recursive_functions)
subroutine, and will update to do so.
Stmt VisitStmt_(const EvaluateNode* eval) override { | ||
if (auto call = eval->value.as<CallNode>()) { | ||
if (auto gvar = call->op.as<GlobalVar>()) { | ||
if (auto opt_callee = inlinable_funcs_.Get(gvar.value())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is perhaps a place where reducing nesting might improve readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the nesting is a bit deep here. Reordered to instead call a GetInlinedSubroutine
method, and let me know what you think on it.
PSet<GlobalVar> GetRemovableFunctions() const { return removable_funcs_; } | ||
|
||
private: | ||
Stmt VisitStmt_(const EvaluateNode* eval) override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to mention the details from the PR description as for why cases other than EvaluateNode
are not handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, and I've added the details in a comment, along with pointing to the xfail
test case.
Not a strong blocker, just an unsupported case at the moment. Something that can definitely be extended in the future, but not something required for use late in the lowering pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my suggestions.
Doing one last CI re-run before merging. I don't expect there to be breaking changes introduced over Christmas, but I try to avoid stale CI results either way. |
CI is passing, except for a flaky unit test. I've submitted #16337 to disable the flaky unit test. Re-running the CI to see if I can hit the 2/3 chance of passing the flaky test while I wait on it. |
The functionality to express a call from one
PrimFunc
to another was introduced in #14889. While this was initially planned to be supported at codegen for all targets (see #15835), this resulted in breakage on some backends (see #16033). After discussion, the plan was changed to support TIR inlining, which would enable the same high-level functionality in TIR without requiring immediate low-level support across all codegens.This commit implements and tests a new IRModule transform
InlinePrivateFunctions
, which can be used as part of lowering in a follow-up commit.Because this is initially implemented for use quite late in the lowering flow, many constructs are not currently supported. The current implementation has the following restrictions.
tir::Block
nodes may not occur in the inlined function. Because a subroutine may be called multiple times, inlining of a subroutine that containstir::Block
would result in non-unique names. Support of subroutines withtir::Block
instances will require de-duplication of block names.The subroutine's callsite must occur within a
tir::Evaluate
block. Because inlining a subroutine inserts thetir::Stmt
body at the point of use, replacement must occur in a context where atir::Stmt
can be returned. Support of subroutines that are called within an expression (e.g. Replacingfunc
inBuf[0] = func(1) + func(2)
) would require hoisting preprocessing done in the subroutine to the parenttir::Stmt
.The subroutine may only accept primitive arguments, and must have an empty
buffer_map
. Support of subroutines that are called withtir::Buffer
ortir::BufferRegion
arguments would require a way to represent these arguments at the callsite, and substitution of the buffer into the callee.If these unsupported constructs are used, then the inlining of those functions is skipped. This commit includes unit tests for these unsupported constructs, to validate that
InlinePrivateFunctions
produces well-formed output even when they are present.