-
Notifications
You must be signed in to change notification settings - Fork 14.7k
[mlir][Vector] add vector.insert canonicalization pattern to convert a chain of insertions to vector.from_elements #142944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[mlir][Vector] add vector.insert canonicalization pattern to convert a chain of insertions to vector.from_elements #142944
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-mlir-spirv @llvm/pr-subscribers-mlir Author: Yang Bai (yangtetris) ChangesDescriptionThis change introduces a new canonicalization pattern for the MLIR Vector dialect that optimizes chains of constant insertions into vectors initialized with Please be aware that the new pattern doesn't work for poison vectors where only some elements are set, as MLIR doesn't support partial poison vectors for now. New Pattern: InsertConstantToPoison
Refactored Helper Function
Example
It also works for multidimensional vectors.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments. Otherwise, it LGTM!
return failure(); | ||
|
||
auto newAttr = DenseElementsAttr::get(destTy, initValues); | ||
rewriter.replaceOpWithNewOp<arith::ConstantOp>(op, destTy, newAttr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use vector.from_elements here and let it canonicalize to arith.constant if all inputs are constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea! That would indeed remove the constant-only restriction. Let me try implementing it. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just learned that currently vector.from_elements does not have a folder to fold it into an arith.constant op. Do you happen to know if there is any ongoing PR implementing this? If not, I would be interested in creating one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to create one, we should have one like that if it doesnt exist already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I think that's is a great idea and something that we would need anyways. Let's implement that first!
// Calculate the linearized position for inserting elements and extract values | ||
// from the source attribute. Returns the starting position in the destination | ||
// vector where elements should be inserted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calculat the linearized position based on what? I cannot understand what this is trying to say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion. I'll updated the function’s documentation to clarify this. Since we adopted a new implementation using from_elements, I also refactored this function. It now focuses solely on calculating the linearized position of the continuous chunk of elements to insert within the destination vector.
auto convertIntegerAttr = [](Attribute attr, Type expectedType) -> Attribute { | ||
if (auto intAttr = mlir::dyn_cast<IntegerAttr>(attr)) { | ||
if (intAttr.getType() != expectedType) | ||
return IntegerAttr::get(expectedType, intAttr.getInt()); | ||
} | ||
return attr; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this code originates from #88314 which addressed issues related to mismatches between index types and i64. I think this background is enough to reassures us about the safety. But, it also reminded me that we might need a similar attribute conversion in the from_elements to constant PR. I'll verify later whether that pattern encounters similar issues.
/// This pattern identifies chains of vector.insert operations that: | ||
/// 1. Start from an ub.poison operation. | ||
/// 2. Insert only constant values at static positions. | ||
/// 3. Completely initialize all elements in the resulting vector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all elements are initialized, why does it matter if the operation started from a ub.poison?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the vector didn't start from ub.poison, the existing foldDenseElementsAttrDestInsertOp in InsertOp::fold
should already be able to handle the folding. However, it does not guarantee all elements are initialized. That’s why I want to add this new pattern explicitly targets ub.poison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, this feels like an awkward special case. Why can't the existing folder handle this case instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I chose not to use the folder is that this method requires traversing the entire chain of insert operations, which results in linear time complexity for each traversal. If someone uses createOrFold to create a sequence of insert operations, the access pattern would look like this:
insert op 1 -> ub.poison
insert op 2 -> insert op 1 -> ub.poison
insert op 3 -> insert op 2 -> insert op 1 -> ub.poison
...
insert op n -> insert op n - 1 -> ... -> inset op 1 -> ub.poison
The condition for being fully initialized is only satisfied at the very last line, but the overall complexity becomes O(n²) due to repeated traversals.
However, I also admit that this is an awkward case. I’m completely okay if you think it’s not worth introducing a new canonicalization pattern. Perhaps it would be much easier to address this once partial poisoned vectors can be easily represented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't matter at all if the start is a ub.poison. If you think that the pattern isn't a canonicalization if the start is something else, then it shouldn't be a canonicalization at all. Personally, this seems like a cleanup and we could live with this in cleanup before lowering to backends.
But the initial start shouldn't matter at all whatever the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't matter at all if the start is a ub.poison.
After a second thought, I think it indeed makes sense. We don’t need to worry about whether a canonicalization pattern may overlap with a folder in some cases. I'll remove the check for ub.poison
.
if (previousInsertOp.hasDynamicPosition()) | ||
return failure(); | ||
|
||
// The inserted content must be constant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you use from_elements this is not required.
if (!srcAttrs.back()) | ||
return failure(); | ||
|
||
// An insertion at poison index makes the entire chain poisoned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen in this case? Can you add a comment if this will get folded by the folder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The insert operation’s folder will fold this case into a ub.poison, effectively truncating the insert chain’s backward traversal. If the vector is not fully initialized by that point, this pattern will definitively fail. I will add a comment to clarify this behavior in the code.
Update: I also moved this check to another place to give this pattern a better chance of succeeding.
// Currently, MLIR doesn't support partial poison vectors, so we can only | ||
// optimize when the entire vector is completely initialized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly, you can always do a shufflevector instruction to get these things in the right order, but we can ignore this for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that this is about representing a constant vector with poison and non-poison values. AFAIK, we can't represent that in MLIR right now
SmallVector<bool> initialized(vectorSize, false); | ||
SmallVector<Attribute> initValues(vectorSize); | ||
|
||
for (auto [insertOp, srcAttr] : llvm::zip(chainInsertOps, srcAttrs)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use zip_equal if they are equal
/// This pattern identifies chains of vector.insert operations that: | ||
/// 1. Start from an ub.poison operation. | ||
/// 2. Insert only constant values at static positions. | ||
/// 3. Completely initialize all elements in the resulting vector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, this feels like an awkward special case. Why can't the existing folder handle this case instead?
Almost all of the changes have been completed. However, while updating the test base, I found that VectorFromElementsLowering currently does not support vectors with rank > 1. So, replacing insert chains with |
What is the current state of this? Any blockers? |
It is still blocked due to the missing from_elements to llvm conversion for multi-dim vectors. I think we can
|
It sounds like this shouldn't be too complicated. Is that something you could help with? |
I'm not familiar with that part, but I think I can give it a try. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. I am wondering whether this is robust enough to handle overlapping vector.insert
operations?
Also, I am not sure whether this would be beneficial? Also, are there any tests for this case?
// Before:
%poison = ub.poison : vector<2x3xi64>
%v1 = vector.insert %cv0, %poison[0] : vector<3xi64> into vector<2x3xi64>
%result = vector.insert %cv1, %v1[1] : vector<3xi64> into vector<2x3xi64>
// After:
%v1 = vector.extract %cv0[0] : i64 from vector<3xi64>
%v2 = vector.extract %cv0[1] : i64 from vector<3xi64>
%v3 = vector.extract %cv0[2] : i64 from vector<3xi64>
%v4 = vector.extract %cv1[0] : i64 from vector<3xi64>
%v5 = vector.extract %cv1[1] : i64 from vector<3xi64>
%v6 = vector.extract %cv1[2] : i64 from vector<3xi64>
%result = vector.from_elements %v1, %v2, %v3, %v4, %v5, %v6 : vector<2x3xi64>
// Check if the result is used as the dest operand of another vector.insert | ||
// Only care about the last op in a chain of insertions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document why we need the check as opposed to what is being checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Document updated.
if (currentOp && !currentOp->hasOneUse()) | ||
return failure(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this condition guarantee that conditions checked in lines L3293-L3296 are always met?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are two different checks.
The check on lines L3293-L3296 is to ensure that there are no more insert ops after the current op. The purpose of that check is to skip the O(n) time complexity part for intermediate insert ops.
In contrast, this hasOneUse
check is to handle the following case:
%v1 = %vector.insert %c1, %v0[0] : i64 into vector<3xi64>
%v2 = %vector.insert %c2, %v1[1] : i64 into vector<3xi64>
%v3_3 = %vector.insert %c3, %v2[2] : i64 into vector<3xi64>
%v3_4 = %vector.insert %c4, %v2[2] : i64 into vector<3xi64>
%v3_5 = %vector.insert %c5, %v2[2] : i64 into vector<3xi64>
The key point is that when %v1
or %v2
has multiple users, we should not introduce new from_elements ops because the insert chain cannot be completely eliminated. This IR is also an example of the 'explosion' you asked about later. We want to avoid generating three new form_elements ops for %v3_3
, %v3_4
and %v3_5
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added a negative test for such case.
// Check that intermediate inserts have only one use to avoid an explosion | ||
// of vectors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite follow what you mean by "explosion of vectors". With multiple users there simply wouldn't be a "chain" of vector.insert
s. Could you provide an example of where such an "explosion" could happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to the previous question for an example.
// The insert op folder will fold an insert at poison index into a | ||
// ub.poison, which truncates the insert chain's backward traversal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the reason to "fail" simply that there's nothing that this transformation can do with an index that's poison
? I'm not sure whether other transformations are relevant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fold function is relevant. However, the reason you gave is also true and easier to understand. Will update the document, thanks.
if (initialized[index]) | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are multiple vector.insert
inserting at the same "index", how do you make sure to select the right one (i.e. the one inserting "last")? Are there any tests for this scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern uses two reverse traversals to ensure the last insert op takes effect.
It follows the chain backwards to collect insert ops into a list, then a llvm::reverse
(L3364) is applied to the list. So the later insert ops in the chain will override the earlier ones. The continue
statement you quoted only affects the initialization count and won't skip setting values.
It is better to have a test. I just added fully_insert_to_vector_overlap
to cover it.
Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>
✅ With the latest revision this PR passed the C/C++ code formatter. |
Please refer to another thread on this page.
Sorry I didn't update the PR description in time. With @dcaballe 's advice, we now use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates!
Overall looks good and is very clearly crafted! I've left some final suggestions inline.
// This pattern has linear time complexity with respect to the length of the | ||
// insert chain. So we only care about the last insert op which has the | ||
// highest probability of success. | ||
for (Operation *user : op.getResult().getUsers()) | ||
if (auto insertOp = dyn_cast<InsertOp>(user)) | ||
if (insertOp.getDest() == op.getResult()) | ||
return failure(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, I found this block a bit confusing - below is my current understanding for confirmation.
To identify a valid chain of vector.insert
operations, the pattern first needs to locate the trailing vector.insert
op. Consider the following two example chains:
- insert -> insert -> insert -> insert
- insert -> insert -> insert -> insert
Only Option 1 qualifies under the current design. That makes sense to me - it’s a conscious design choice that simplifies the pattern logic. Supporting Option 2 would be possible, but would likely add significant complexity.
This code block ensures that the given op is the trailing vector.insert
in a chain that matches this pattern.
If that's accurate, I’d suggest updating the in-code comment to something like:
Ensure this is the trailing vector.insert op in a chain of inserts.
I'd also recommend adding a note about this constraint in the high-level comment for the pattern.
As for this comment ...
// This pattern has linear time complexity with respect to the length of the
// insert chain. So we only care about the last insert op which has the
// highest probability of success.
IMHO, you want to avoid matching overly complex or fragmented insert chains, and focusing on the last op is a clean and efficient approach. That's the design and that's fine. To me, everything else is secondary and can be skipped.
I mostly want to avoid our "future selves" getting into a discussion on "probability" and "cost" 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your understanding. This is exactly what this code is trying to express! The comment you proposed looks much more reasonable, let me update it.
/// 1. Only insert values at static positions. | ||
/// 2. Completely initialize all elements in the resulting vector. | ||
/// 3. All intermediate insert operations have only one use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] It would be helpful if you made references to these high level design points within the implementation (e.g. "Check Cond 1. (only static indices are used)").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
|
||
// ----- | ||
|
||
// Test the case where multiple ops insert to overlapped indices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not immediately obvious that ARG0
is completely skipped - could you add a comment?
I would also add more examples of overlapping vector.insert
Ops. Perhaps:
insert scalar
insert scalar
insert scalar
insert vector
where "insert vector" overwrites all of "insert scalar".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Let me add a detailed comment and a new test.
Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>
Description
This change introduces a new canonicalization pattern for the MLIR Vector dialect that optimizes chains of insertions. The optimization identifies when a vector is completely initialized through a series of vector.insert operations and replaces the entire chain with a single
vector.from_elements
operation.Please be aware that the new pattern doesn't work for poison vectors where only some elements are set, as MLIR doesn't support partial poison vectors for now.
New Pattern: InsertChainFullyInitialized
Refactored Helper Function
calculateInsertPosition
fromfoldDenseElementsAttrDestInsertOp
to avoid code duplication.Example
It also works for multidimensional vectors.