-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDA] Support multiple TIR-level dynamic shared memory allocations #8571
Conversation
This reverts commit ce62d9e.
For the dyn shmem matmul test, the generated kernel looks like:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse buffers that are out of the lifetime in the new pass? To be specific, please see the following example:
A_shared[i] = A[i]
A_local[i] = A_shared[i]
C_local[i] = A_local[i] + 1
C_shared[i] = C_local[i]
Since A_shared[i]
is never used when we store to C_shared
. We can directly store the data into A_shared[i]
to reduce memory usage. It is supported in storage_rewrite
Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Thanks, I didn't think about reuse support. To support this, I think it is better to drop the new pass in this PR and merge the new functionality to One difficulty I can imagine is that, dynamic shared memory in general has unknown alloc size. So for the general cases I don't think reuse analysis would work just like it does in |
I think we can use |
I agree. For constant sizes, |
A follow-up to #8466
A new pass is added to merge multiple TIR-level dynamic shared memory allocations, whose sizes may not be a constant. This case is not handled by
storage_rewrite
pass. Rather than updatingstorage_rewrite
pass, I added a new pass since the logic is simpler (we MUST merge and we know which alloc to merge).Hetero-dtype is supported per discussion #8466 (comment)
@tqchen @vinx13 @yzh119