-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC][TIR] Adding annotation field to tir.allocate #23
Conversation
* adding the markdown Change-Id: I6980d8f9a228ff5e8a79d74220db9b77f88a3e1b
* adding the PR number Change-Id: I9427b96488f73479cf96424f44fb136f9eeb07e4
Thank you @manupa-arm for the proposal. What you described is closest to the "tag" field of the storage scope, which we put as part of the pointer and associate with most of the storage. I think it would be great to think in terms of this direction. Specifically, when we have a clear distinction of memory types, "sram", "shared_memory", "texture". We will use the special memory tag to indicate the type of such kind of memory. The proposed pinned memory field, however, have multiple of such candidates. In my understanding they can be served as aadditional hints to follow up passes. If the intended use is multiple candidate hints, then the tag is perhaps no longer approperiate because it is used to indicate one kind of special memory that we place the data to. In that case, I think it could be closer to an additional annotation field https://github.com/apache/tvm/blob/main/include/tvm/tir/stmt.h#L817 that stores additional attribute hints about the particular node. This would also help reduce the possible indirection introduced by AttrStmt. |
Thanks @manupa-arm for the nice proposal! Currently the memory scope of a buffer not only indicates where the buffer is allocated, but also affects schedule primitives. For example, “shared” and “local” may lead to different TIRs after lowering. Therefore, it would be good to consolidate scopes that don’t affect scheduling behaviors with additional annotations fields, as @tqchen suggested. Thanks a lot 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the RFC, it's good to discuss this. I'm in agreement with what has been said wrt the memory scope, and using the storage_scope field (including tag) of the buffer's pointertype simplifies your S3 to using only needing this field if I understand your needs correctly. Can I ask what is meant by the below and when you might use this?
|
||
# 4. Reference-level explanation | ||
|
||
At the IR, we ll need to associate each allocate node with one (or more) memories that it can end up, because the scheduling might be satisfied with placing buffers in any of the memories in a given set of memories. Therefore, the scheduler might want the memory planners to decide which memory to use based on finding the allocation that fit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide more context with this statement specifically:
we'll need to associate each allocate node with one (or more) memories that it can end up, because the scheduling might be satisfied with placing buffers in any of the memories in a given set of memories.
It may be relevant to some related work we are doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @csullivan,
Let's say there are multiple physical memories that have same bandwidth and latency and schedulers are just fine placing the buffer in each of them as long as all things fit. In the scenario, the scheduler would want keep more than one candidate for the memory planner to decide.
However, based on the comments, I feel we are going to use two fields for these two purposes. See my latest response below.
To clarify a bit, @tqchen and I are referring to a solution that we are still using "global" as the storage scope, which is served as the property of the storage, but add extra annotations (e.g. "sram") to further indicate the exact place where the buffer is stored. This way, the system is able to schedule correctly |
Hi All, Thanks for taking the time to look at it. P1) Indicate candidate memories (a.k.a. pools) that each allocate be associated with However, having read the comments here, I feel these could be two different fields.
For P1, I think we could use this annotation field to initially describe this. T1) :
In this way, we dont need to carry this as an IRModule attribute to be de-referenced. T2) :
when the key is "candidate_memory_pools" would be the value is Map<String, PoolInfo> candidate_memory_pools. I would personally prefer T1 for its direct approach here but T2 could also be made to work.
Thereafter once the memory planner is run, it could read the annotation field and populate a final 'global.<memory_pool>'. |
Thanks @manupa-arm I would suggest T2 to make things consistent with the rest of IR design and allow future annotations if needed. Note that annotation fields only offers hints to future passes(on how to rewrite) and should not change the semantics of the IR |
Thanks @tqchen , yes we could work with this for now. I take it that for P2, we could use the tag of storage_scope as well, then ? |
Thanks @manupa-arm , P2 falls into the domain of the special memory tag, and the storage scope is exactly designed for this purpose. |
Cool; I ll adjust the RFC text tommorow. |
* Changing the text to show that agreed change of addition of annotations for the allocate node to indicate the candidate memory pools while re-using the tag to indicate the committed pool that the allocate will get pinned. Change-Id: I83171137fcb17477b16891445ca10d0372b6b2e1
Hi all, Sorry for delay! PTAL and if its good, shall we get this in ? |
@tqchen, can you take a look? I can modify the USMP RFC to reflect the decisions made here, if we could finalize the discussion here. |
Thanks @manupa-arm , sorry for the delay. I think the proposal(of adding annotations) looks good. As a formal procedure for IR change. It would be great to have an RFC(or change this RFC) to "Adding annotation field to allocation" and discuss the need of pinned memory as a usecase. As for the annotation field itself, it would be great to follow the documentation of ForNode here https://github.com/apache/tvm/blob/main/include/tvm/tir/stmt.h#L817 that states the annotation only served as a hint for optimizations and does not change the semantics of the allocate. After the annotations field is added. The design choice of pinned memory can then becomes a specific choice of USMP and can be merged with USMP RFC. |
Moved the pinned memory representation to adding generic annotation field, after the community discussion. Change-Id: I122cb0bd9d72bd2350003908d14e187d0902cbff
Done |
Thanks for the really helpful discussion! The design choice looks good to me :-) |
Thanks @junrushao1994 , it would be great if we could finalize the RFC here because it has a cascading effect to the USMP RFC and certain PRs waiting on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @manupa-arm i added some feedback
} | ||
``` | ||
|
||
Therefore, we'd need a way to represent the association of each of these memories, that the user will pin the buffers to, closer to allocate nodes in TIR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the user could pin buffers or USMP could pin buffers. does this address both needs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both cases are 'users' of this addition to the IR in a very similiar way.
However, I dont think users have an API directly to pin tir.allocates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, in this sentence, it means the buffer mentioned in the sample application above.
USMP will not do that. USMP will change program to expect that the application/runtime will pass in a buffer of the correct size pinned to correct memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think that users should have such an API? I kind of do (but it would be an advanced use case and e.g. done via Python script perhaps).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs more thinking around the usecases. Shall we discuss that as a follow up RFC?
|
||
At the IR, we ll need to associate each allocate node with one (or more) memories that it can end up, because the scheduling might be satisfied with placing buffers in any of the memories in a given set of memories. Therefore, the scheduler might want the memory planners to decide which memory to use based on finding the allocation that fit. | ||
|
||
There are broadly two requirements here : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this go in Guide-level explanation with some examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think it should go to the guide-level explanation of this RFC because this is just motivating the addition to the IR.
Instead, I ll add a reference to USMP RFC to see for further usecases of this IR change.
|
||
# 3. Guide-level explanation | ||
|
||
This is not particularly a user-facing feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd say if the user is pinning tir.allocate it could be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think we have a user API to do that.
This IR change is to be used by compilation passes to pass on hints to future optimizations.
Therefore, the core compiler could use them to pass on the PoolInfo to tir.allocates in the passes.
I would not consider passes as users, in this scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but couldn't a user in principle use such a thing through TVMscript?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be careful in what we expose to the user and I'd rather keep them just exposed to passes until the feature gets stable and mature.
If and when we decide to open this upto the user, we could maybe do the following:
convert Object possibly introducing "fromString" and "toString" methods I suppose.
If and when we decide to add that feature (for e.g. for ForNode s) we could do the same here?
|
||
# 5. Drawbacks | ||
|
||
None. Its consistent with rest of the IR design and allows other features to use to pass hints for future transformation in the compiler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a field to tir.allocate
is a bigger code lift. can we think of some other drawbacks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think adding an annotation field (similiar to ForNode) to pass compilation hints for passes is bigger change compared to other TIR nodes we already have today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, what about the concern about creating multiple ways to do the same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think this RFC adds a 'new' way compared to existing ways compared to other IRNodes.
|
||
To serve P2, we propose to use the existing tag of the storage_scope with 'global-<memory_name>'. | ||
|
||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add some explanation of this struct? it's just here with no explanation of its purpose. also it looks cut-off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just remove it as tags of storage_scope is an established concept and not relevant to this RFC now.
cc @mbs-octoml can you take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @areusch for taking a look.
I've anwsered your questions and adjusted the text accordingly.
} | ||
``` | ||
|
||
Therefore, we'd need a way to represent the association of each of these memories, that the user will pin the buffers to, closer to allocate nodes in TIR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both cases are 'users' of this addition to the IR in a very similiar way.
However, I dont think users have an API directly to pin tir.allocates.
|
||
At the IR, we ll need to associate each allocate node with one (or more) memories that it can end up, because the scheduling might be satisfied with placing buffers in any of the memories in a given set of memories. Therefore, the scheduler might want the memory planners to decide which memory to use based on finding the allocation that fit. | ||
|
||
There are broadly two requirements here : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think it should go to the guide-level explanation of this RFC because this is just motivating the addition to the IR.
Instead, I ll add a reference to USMP RFC to see for further usecases of this IR change.
|
||
To serve P2, we propose to use the existing tag of the storage_scope with 'global-<memory_name>'. | ||
|
||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just remove it as tags of storage_scope is an established concept and not relevant to this RFC now.
|
||
# 3. Guide-level explanation | ||
|
||
This is not particularly a user-facing feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think we have a user API to do that.
This IR change is to be used by compilation passes to pass on hints to future optimizations.
Therefore, the core compiler could use them to pass on the PoolInfo to tir.allocates in the passes.
I would not consider passes as users, in this scenario.
|
||
# 5. Drawbacks | ||
|
||
None. Its consistent with rest of the IR design and allows other features to use to pass hints for future transformation in the compiler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think adding an annotation field (similiar to ForNode) to pass compilation hints for passes is bigger change compared to other TIR nodes we already have today.
* addressing Andrew's comments Change-Id: Ic9a4c6c0cff43057e5f2e7f9654355fe7c91ed24
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're approaching the same problem from opposite ends. For device planning we wanted to avoid the Map<Expr, Device> convention for recording the association of a device to every Relay expression since it does not survive transformation.
We considered:
- adding a DictAttrs to the base Expr node, and auditing the entire codebase for preserving attributes on copies.
- adding a new WithAttrs Relay node to capture annotations which are implicitly scoped over the sub-expression.
- using 'fake' "on_device" calls to record device associations for a sub-expression.
- using attributes on the existing Function definitions, which are already copied correctly.
I've pushed on a combination of the last two to see where it takes us.
There is a general desire to extend device planning to be memory scope aware, which is where the above and this RFC make contact. In particular, would you ever want the set of candidate scopes to have been provided by some extended device planning pass in Relay? Or are you imagining that the candidate scopes are attached directly during lowering and are separate from the current device planning machinery?
If we want these worlds to work together it's worth adding DictAttrs further up (maybe even Object???) and make sure the codebase preserves attributes by default in all transforms.
There is generally tradeoffs between:
From the A1's pov, allowing set of attributes that changes the semantics of the IR is in general not desirable. Since other pass writers may not be aware of all the possible set of attributes and thus being mindful in preserving them. The middle ground that we could reach here is to allow annotations, which only served as hints for future passes(that the code can be transformed in certain way). This means that the other pass writers can safely ignore the content of annotation itself. Note that this would certainly restrict the possible set of annotations that we can use(e.g. the object themselves should not refer to an active piece in the IR). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tqchen however doesn't keeping stuff just as an AttrStmtNode serve the same purpose? Why should we add this additional map-to-arbitrary-thing to the IR? It seems like a shortcut. And why should AttrStmtNode limit its values to PrimExpr but annotations not do so? finally, are there concerns about roundtripping any such metadata through tvmscript? cc @junrushao1994
I kind of agree we should consider whether DictAttrs and AttrStmt could work for this.
|
||
# 3. Guide-level explanation | ||
|
||
This is not particularly a user-facing feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but couldn't a user in principle use such a thing through TVMscript?
|
||
# 5. Drawbacks | ||
|
||
None. Its consistent with rest of the IR design and allows other features to use to pass hints for future transformation in the compiler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, what about the concern about creating multiple ways to do the same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im not sure whether we want to roundtrip annotations that are only put via compilation passes. I'd rather have them restricted to be inserted by a controlled compilation pipeline, at least initially.
We have both ways to attach attributes/annotations to an IRNode in the codebase already (AttrStmt is restricted to be PrimExpr for now -- therefore users could attach simple attributes/pragmas -- I think, however it would not work for our current need). It is an interesting discussion to have IF (not sure we really want to) we want to stick to one, but I think that work should be done holistically and once for all IRNodes (e.g. ForNode). My reasons currently are that annotations are bit private to the compiler and could only be inserted by passes.
I feel annotations are simpler and straightforward while AttrStmtNode needs to handle in separate visits prior to visiting the actual StmtNode while it could have been simpler to attach them directly to the StmtNode of interest. I guess one point could be the AttrStmtNode survives by default that mutates the StmtNode, however that is a roundabout way where the mutations should've copied the Attrs/Annotations by default.
I would prefer the annotations over AttrStmt due to the somewhat complex handling of it in all the passes that care about it.
|
||
# 5. Drawbacks | ||
|
||
None. Its consistent with rest of the IR design and allows other features to use to pass hints for future transformation in the compiler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think this RFC adds a 'new' way compared to existing ways compared to other IRNodes.
|
||
# 3. Guide-level explanation | ||
|
||
This is not particularly a user-facing feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be careful in what we expose to the user and I'd rather keep them just exposed to passes until the feature gets stable and mature.
If and when we decide to open this upto the user, we could maybe do the following:
convert Object possibly introducing "fromString" and "toString" methods I suppose.
If and when we decide to add that feature (for e.g. for ForNode s) we could do the same here?
@areusch yes about AttrStmt. I had to admit that AttrStmt has now becomes a legacy issue that I hope we can gradually move away from(so those that changes semantics can be directly reflected in the IR and properly documented). While we cannot immediately remove them, we can gradually move the features out and eventually deprecate |
@tqchen merge? |
I added one last minor nits. @mbs-octoml @csullivan @areusch please take another look and let us merge in 24 h |
Making the brief for annotation generic for tir.allocate
@tqchen thanks for spotting that! did the change. |
* type in the brief in the example
No objections from me. |
@tqchen shall we get this in then? |
Thanks @manupa-arm @mbs-octoml @csullivan @areusch this RFC is now merged! |
Change-Id: I6980d8f9a228ff5e8a79d74220db9b77f88a3e1b