Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XPU][PHI Kernels] optimize unique & index_put #56582

Merged
merged 1 commit into from
Aug 28, 2023

Conversation

lj970926
Copy link
Contributor

@lj970926 lj970926 commented Aug 23, 2023

PR types

Performance optimization

PR changes

OPs

Description

  1. unique kernel:将UniqueDimFactor循环中的nonzero_count用循环外的一次reduce_all代替,避免axis_len较长时产生过多的kernel调用影响性能。
  2. index_put_kernel:优化使用临时Tensor时的wait逻辑。当kernel在默认流上时由于runtime的deferred free机制能确保释放的显存不会在当前kernel执行完之前被再次分配给其他kernel,此时无需显式wait。

Copy link
Contributor

@RuohengMa RuohengMa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@QingshuChen QingshuChen merged commit d674ea9 into PaddlePaddle:develop Aug 28, 2023
lxd-cumt pushed a commit to lxd-cumt/Paddle that referenced this pull request Aug 28, 2023
BeingGod pushed a commit to BeingGod/Paddle that referenced this pull request Sep 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants