Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC][TIR] Adding annotation field to tir.allocate #23

Merged
merged 7 commits into from
Oct 1, 2021

Conversation

manupak
Copy link
Contributor

@manupak manupak commented Aug 17, 2021

  • adding the markdown

Change-Id: I6980d8f9a228ff5e8a79d74220db9b77f88a3e1b

* adding the markdown

Change-Id: I6980d8f9a228ff5e8a79d74220db9b77f88a3e1b
* adding the PR number

Change-Id: I9427b96488f73479cf96424f44fb136f9eeb07e4
@manupak
Copy link
Contributor Author

manupak commented Aug 17, 2021

@tqchen tqchen added the status: need review RFC needs review label Aug 19, 2021
@tqchen
Copy link
Member

tqchen commented Aug 21, 2021

Thank you @manupa-arm for the proposal.

What you described is closest to the "tag" field of the storage scope, which we put as part of the pointer and associate with most of the storage. I think it would be great to think in terms of this direction.

Specifically, when we have a clear distinction of memory types, "sram", "shared_memory", "texture". We will use the special memory tag to indicate the type of such kind of memory.

The proposed pinned memory field, however, have multiple of such candidates. In my understanding they can be served as aadditional hints to follow up passes. If the intended use is multiple candidate hints, then the tag is perhaps no longer approperiate because it is used to indicate one kind of special memory that we place the data to.

In that case, I think it could be closer to an additional annotation field https://github.com/apache/tvm/blob/main/include/tvm/tir/stmt.h#L817 that stores additional attribute hints about the particular node. This would also help reduce the possible indirection introduced by AttrStmt.

@tqchen
Copy link
Member

tqchen commented Aug 21, 2021

cc @masahi @kparzysz-quic @csullivan

@junrushao
Copy link
Member

Thanks @manupa-arm for the nice proposal!

Currently the memory scope of a buffer not only indicates where the buffer is allocated, but also affects schedule primitives. For example, “shared” and “local” may lead to different TIRs after lowering. Therefore, it would be good to consolidate scopes that don’t affect scheduling behaviors with additional annotations fields, as @tqchen suggested. Thanks a lot 🙏

Copy link
Contributor

@csullivan csullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the RFC, it's good to discuss this. I'm in agreement with what has been said wrt the memory scope, and using the storage_scope field (including tag) of the buffer's pointertype simplifies your S3 to using only needing this field if I understand your needs correctly. Can I ask what is meant by the below and when you might use this?


# 4. Reference-level explanation

At the IR, we ll need to associate each allocate node with one (or more) memories that it can end up, because the scheduling might be satisfied with placing buffers in any of the memories in a given set of memories. Therefore, the scheduler might want the memory planners to decide which memory to use based on finding the allocation that fit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide more context with this statement specifically:

we'll need to associate each allocate node with one (or more) memories that it can end up, because the scheduling might be satisfied with placing buffers in any of the memories in a given set of memories.

It may be relevant to some related work we are doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @csullivan,

Let's say there are multiple physical memories that have same bandwidth and latency and schedulers are just fine placing the buffer in each of them as long as all things fit. In the scenario, the scheduler would want keep more than one candidate for the memory planner to decide.

However, based on the comments, I feel we are going to use two fields for these two purposes. See my latest response below.

@junrushao
Copy link
Member

To clarify a bit, @tqchen and I are referring to a solution that we are still using "global" as the storage scope, which is served as the property of the storage, but add extra annotations (e.g. "sram") to further indicate the exact place where the buffer is stored. This way, the system is able to schedule correctly

@manupak
Copy link
Contributor Author

manupak commented Aug 31, 2021

Hi All,

Thanks for taking the time to look at it.
Initially, my thoughts were to use the same field for two purposes :

P1) Indicate candidate memories (a.k.a. pools) that each allocate be associated with
P2) After the memory planner is run, it will pick one out of the list by reducing the candidates to be exactly 1.

However, having read the comments here, I feel these could be two different fields.

The proposed pinned memory field, however, have multiple of such candidates. In my understanding they can be served as aadditional hints to follow up passes. If the intended use is multiple candidate hints, then the tag is perhaps no longer approperiate because it is used to indicate one kind of special memory that we place the data to.

For P1, I think we could use this annotation field to initially describe this.
Something like (as proposed by @MichaelJKlaiberBosch here ) :

T1) :

Map<String, PoolInfo> candidate_memory_pools;

In this way, we dont need to carry this as an IRModule attribute to be de-referenced.
Or should we just add a very generic

T2) :

Map<String, Objectref> annotations;

when the key is "candidate_memory_pools" would be the value is Map<String, PoolInfo> candidate_memory_pools.

I would personally prefer T1 for its direct approach here but T2 could also be made to work.

What you described is closest to the "tag" field of the storage scope, which we put as part of the pointer and associate with most of the storage. I think it would be great to think in terms of this direction.

Specifically, when we have a clear distinction of memory types, "sram", "shared_memory", "texture". We will use the special memory tag to indicate the type of such kind of memory.

Thereafter once the memory planner is run, it could read the annotation field and populate a final 'global.<memory_pool>'.
I think this should address the all of your concerns. Let me know, if agreed I'll adjust the RFC text.

@tqchen
Copy link
Member

tqchen commented Aug 31, 2021

Thanks @manupa-arm I would suggest T2 to make things consistent with the rest of IR design and allow future annotations if needed.

Note that annotation fields only offers hints to future passes(on how to rewrite) and should not change the semantics of the IR

@manupak
Copy link
Contributor Author

manupak commented Aug 31, 2021

Thanks @tqchen , yes we could work with this for now.

I take it that for P2, we could use the tag of storage_scope as well, then ?

@tqchen
Copy link
Member

tqchen commented Aug 31, 2021

Thanks @manupa-arm , P2 falls into the domain of the special memory tag, and the storage scope is exactly designed for this purpose.

@manupak
Copy link
Contributor Author

manupak commented Aug 31, 2021

Cool; I ll adjust the RFC text tommorow.

* Changing the text to show that agreed change
  of addition of annotations for the allocate node
  to indicate the candidate memory pools while
  re-using the tag to indicate the committed pool
  that the allocate will get pinned.

Change-Id: I83171137fcb17477b16891445ca10d0372b6b2e1
@manupak
Copy link
Contributor Author

manupak commented Sep 8, 2021

Hi all,

Sorry for delay!
I've managed to update the text to reflect the discussion.

PTAL and if its good, shall we get this in ?

@manupak
Copy link
Contributor Author

manupak commented Sep 16, 2021

@tqchen, can you take a look?

I can modify the USMP RFC to reflect the decisions made here, if we could finalize the discussion here.

@tqchen
Copy link
Member

tqchen commented Sep 17, 2021

Thanks @manupa-arm , sorry for the delay. I think the proposal(of adding annotations) looks good.

As a formal procedure for IR change. It would be great to have an RFC(or change this RFC) to "Adding annotation field to allocation" and discuss the need of pinned memory as a usecase.

As for the annotation field itself, it would be great to follow the documentation of ForNode here https://github.com/apache/tvm/blob/main/include/tvm/tir/stmt.h#L817 that states the annotation only served as a hint for optimizations and does not change the semantics of the allocate.

After the annotations field is added. The design choice of pinned memory can then becomes a specific choice of USMP and can be merged with USMP RFC.

Moved the pinned memory representation to adding generic annotation
field, after the community discussion.

Change-Id: I122cb0bd9d72bd2350003908d14e187d0902cbff
@manupak manupak changed the title [RFC][TIR] TIR Pinned Memory Representation [RFC][TIR] Adding annotation field to tir.allocate Sep 20, 2021
@manupak
Copy link
Contributor Author

manupak commented Sep 20, 2021

Done

@junrushao
Copy link
Member

Thanks for the really helpful discussion! The design choice looks good to me :-)

@manupak
Copy link
Contributor Author

manupak commented Sep 20, 2021

Thanks @junrushao1994 , it would be great if we could finalize the RFC here because it has a cascading effect to the USMP RFC and certain PRs waiting on it.

Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @manupa-arm i added some feedback

}
```

Therefore, we'd need a way to represent the association of each of these memories, that the user will pin the buffers to, closer to allocate nodes in TIR.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the user could pin buffers or USMP could pin buffers. does this address both needs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both cases are 'users' of this addition to the IR in a very similiar way.
However, I dont think users have an API directly to pin tir.allocates.

Copy link
Contributor Author

@manupak manupak Sep 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, in this sentence, it means the buffer mentioned in the sample application above.

USMP will not do that. USMP will change program to expect that the application/runtime will pass in a buffer of the correct size pinned to correct memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think that users should have such an API? I kind of do (but it would be an advanced use case and e.g. done via Python script perhaps).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs more thinking around the usecases. Shall we discuss that as a follow up RFC?


At the IR, we ll need to associate each allocate node with one (or more) memories that it can end up, because the scheduling might be satisfied with placing buffers in any of the memories in a given set of memories. Therefore, the scheduler might want the memory planners to decide which memory to use based on finding the allocation that fit.

There are broadly two requirements here :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this go in Guide-level explanation with some examples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think it should go to the guide-level explanation of this RFC because this is just motivating the addition to the IR.
Instead, I ll add a reference to USMP RFC to see for further usecases of this IR change.


# 3. Guide-level explanation

This is not particularly a user-facing feature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd say if the user is pinning tir.allocate it could be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we have a user API to do that.

This IR change is to be used by compilation passes to pass on hints to future optimizations.
Therefore, the core compiler could use them to pass on the PoolInfo to tir.allocates in the passes.

I would not consider passes as users, in this scenario.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but couldn't a user in principle use such a thing through TVMscript?

Copy link
Contributor Author

@manupak manupak Sep 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be careful in what we expose to the user and I'd rather keep them just exposed to passes until the feature gets stable and mature.

If and when we decide to open this upto the user, we could maybe do the following:
convert Object possibly introducing "fromString" and "toString" methods I suppose.
If and when we decide to add that feature (for e.g. for ForNode s) we could do the same here?


# 5. Drawbacks

None. Its consistent with rest of the IR design and allows other features to use to pass hints for future transformation in the compiler.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a field to tir.allocate is a bigger code lift. can we think of some other drawbacks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think adding an annotation field (similiar to ForNode) to pass compilation hints for passes is bigger change compared to other TIR nodes we already have today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, what about the concern about creating multiple ways to do the same thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think this RFC adds a 'new' way compared to existing ways compared to other IRNodes.


To serve P2, we propose to use the existing tag of the storage_scope with 'global-<memory_name>'.

```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add some explanation of this struct? it's just here with no explanation of its purpose. also it looks cut-off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just remove it as tags of storage_scope is an established concept and not relevant to this RFC now.

@areusch areusch assigned areusch and unassigned areusch Sep 21, 2021
@areusch
Copy link
Contributor

areusch commented Sep 21, 2021

cc @mbs-octoml can you take a look?

Copy link
Contributor Author

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @areusch for taking a look.

I've anwsered your questions and adjusted the text accordingly.

}
```

Therefore, we'd need a way to represent the association of each of these memories, that the user will pin the buffers to, closer to allocate nodes in TIR.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both cases are 'users' of this addition to the IR in a very similiar way.
However, I dont think users have an API directly to pin tir.allocates.


At the IR, we ll need to associate each allocate node with one (or more) memories that it can end up, because the scheduling might be satisfied with placing buffers in any of the memories in a given set of memories. Therefore, the scheduler might want the memory planners to decide which memory to use based on finding the allocation that fit.

There are broadly two requirements here :
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think it should go to the guide-level explanation of this RFC because this is just motivating the addition to the IR.
Instead, I ll add a reference to USMP RFC to see for further usecases of this IR change.


To serve P2, we propose to use the existing tag of the storage_scope with 'global-<memory_name>'.

```
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just remove it as tags of storage_scope is an established concept and not relevant to this RFC now.


# 3. Guide-level explanation

This is not particularly a user-facing feature.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we have a user API to do that.

This IR change is to be used by compilation passes to pass on hints to future optimizations.
Therefore, the core compiler could use them to pass on the PoolInfo to tir.allocates in the passes.

I would not consider passes as users, in this scenario.


# 5. Drawbacks

None. Its consistent with rest of the IR design and allows other features to use to pass hints for future transformation in the compiler.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think adding an annotation field (similiar to ForNode) to pass compilation hints for passes is bigger change compared to other TIR nodes we already have today.

* addressing Andrew's comments

Change-Id: Ic9a4c6c0cff43057e5f2e7f9654355fe7c91ed24
Copy link
Contributor

@mbs-octoml mbs-octoml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're approaching the same problem from opposite ends. For device planning we wanted to avoid the Map<Expr, Device> convention for recording the association of a device to every Relay expression since it does not survive transformation.
We considered:

  • adding a DictAttrs to the base Expr node, and auditing the entire codebase for preserving attributes on copies.
  • adding a new WithAttrs Relay node to capture annotations which are implicitly scoped over the sub-expression.
  • using 'fake' "on_device" calls to record device associations for a sub-expression.
  • using attributes on the existing Function definitions, which are already copied correctly.
    I've pushed on a combination of the last two to see where it takes us.

There is a general desire to extend device planning to be memory scope aware, which is where the above and this RFC make contact. In particular, would you ever want the set of candidate scopes to have been provided by some extended device planning pass in Relay? Or are you imagining that the candidate scopes are attached directly during lowering and are separate from the current device planning machinery?

If we want these worlds to work together it's worth adding DictAttrs further up (maybe even Object???) and make sure the codebase preserves attributes by default in all transforms.

@tqchen
Copy link
Member

tqchen commented Sep 22, 2021

There is generally tradeoffs between:

  • A0: The desire to preserve information during transformation.
  • A1: A growing set of attributes that is impossible to keep track of(thus preserve) during transformations.

From the A1's pov, allowing set of attributes that changes the semantics of the IR is in general not desirable. Since other pass writers may not be aware of all the possible set of attributes and thus being mindful in preserving them.

The middle ground that we could reach here is to allow annotations, which only served as hints for future passes(that the code can be transformed in certain way). This means that the other pass writers can safely ignore the content of annotation itself. Note that this would certainly restrict the possible set of annotations that we can use(e.g. the object themselves should not refer to an active piece in the IR).

Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tqchen however doesn't keeping stuff just as an AttrStmtNode serve the same purpose? Why should we add this additional map-to-arbitrary-thing to the IR? It seems like a shortcut. And why should AttrStmtNode limit its values to PrimExpr but annotations not do so? finally, are there concerns about roundtripping any such metadata through tvmscript? cc @junrushao1994

I kind of agree we should consider whether DictAttrs and AttrStmt could work for this.


# 3. Guide-level explanation

This is not particularly a user-facing feature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but couldn't a user in principle use such a thing through TVMscript?


# 5. Drawbacks

None. Its consistent with rest of the IR design and allows other features to use to pass hints for future transformation in the compiler.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, what about the concern about creating multiple ways to do the same thing?

Copy link
Contributor Author

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure whether we want to roundtrip annotations that are only put via compilation passes. I'd rather have them restricted to be inserted by a controlled compilation pipeline, at least initially.

We have both ways to attach attributes/annotations to an IRNode in the codebase already (AttrStmt is restricted to be PrimExpr for now -- therefore users could attach simple attributes/pragmas -- I think, however it would not work for our current need). It is an interesting discussion to have IF (not sure we really want to) we want to stick to one, but I think that work should be done holistically and once for all IRNodes (e.g. ForNode). My reasons currently are that annotations are bit private to the compiler and could only be inserted by passes.

I feel annotations are simpler and straightforward while AttrStmtNode needs to handle in separate visits prior to visiting the actual StmtNode while it could have been simpler to attach them directly to the StmtNode of interest. I guess one point could be the AttrStmtNode survives by default that mutates the StmtNode, however that is a roundabout way where the mutations should've copied the Attrs/Annotations by default.

I would prefer the annotations over AttrStmt due to the somewhat complex handling of it in all the passes that care about it.


# 5. Drawbacks

None. Its consistent with rest of the IR design and allows other features to use to pass hints for future transformation in the compiler.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think this RFC adds a 'new' way compared to existing ways compared to other IRNodes.


# 3. Guide-level explanation

This is not particularly a user-facing feature.
Copy link
Contributor Author

@manupak manupak Sep 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be careful in what we expose to the user and I'd rather keep them just exposed to passes until the feature gets stable and mature.

If and when we decide to open this upto the user, we could maybe do the following:
convert Object possibly introducing "fromString" and "toString" methods I suppose.
If and when we decide to add that feature (for e.g. for ForNode s) we could do the same here?

@tqchen
Copy link
Member

tqchen commented Sep 23, 2021

@areusch yes about AttrStmt. I had to admit that AttrStmt has now becomes a legacy issue that I hope we can gradually move away from(so those that changes semantics can be directly reflected in the IR and properly documented). While we cannot immediately remove them, we can gradually move the features out and eventually deprecate

@manupak
Copy link
Contributor Author

manupak commented Sep 27, 2021

@areusch @tqchen ,

Could we agree to move on with using the annotations instead of AttrStmt?

@manupak
Copy link
Contributor Author

manupak commented Sep 30, 2021

@tqchen merge?

@tqchen
Copy link
Member

tqchen commented Sep 30, 2021

I added one last minor nits. @mbs-octoml @csullivan @areusch please take another look and let us merge in 24 h

Making the brief for annotation generic for tir.allocate
@manupak
Copy link
Contributor Author

manupak commented Sep 30, 2021

@tqchen thanks for spotting that! did the change.

* type in the brief in the example
@mbs-octoml
Copy link
Contributor

No objections from me.

@manupak
Copy link
Contributor Author

manupak commented Oct 1, 2021

@tqchen shall we get this in then?

@tqchen tqchen merged commit 8fd094b into apache:main Oct 1, 2021
@tqchen
Copy link
Member

tqchen commented Oct 1, 2021

Thanks @manupa-arm @mbs-octoml @csullivan @areusch this RFC is now merged!

@tqchen tqchen added status: accepted RFC is accepted and removed status: need review RFC needs review labels Oct 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: accepted RFC is accepted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants