Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace EigenBroadcast with ElementwiseBroadcast in ReduceGrad #38959

Merged
merged 8 commits into from
Jan 25, 2022

Conversation

AnnaTrainingG
Copy link
Contributor

@AnnaTrainingG AnnaTrainingG commented Jan 14, 2022

PR types

Performance optimization

PR changes

OPs

Describe

Reduce EigenBroadcastcase with ElementwiseBroadcast in ReduceGrad
为扩大KP算子覆盖率,统一将Reduce_sum/mean 反向的Eigen适配代码替换为ElementwiseBroadcast Kernel
ReduceGrad 性能统计对比:

op name case axise dtype 优化前 (us) 优化后 (us) speed up
reduce_sum_grad [-1L, 2048L, 33L, 33L] [2,3] float32 626 160.73 3.89
reduce_sum_grad [-1L, 2048L, 33L, 33L] [2,3] float16 633 83.64 7.57
reduce_sum_grad [-1L, 8L, 128L] [1] float32 2.44 1.71 1.43
reduce_sum_grad [-1L, 8L, 128L] [1] float16 2.84 1.7 1.67
reduce_sum_grad [30522L, 1024L] [] float32 150.38 138.79 1.08
reduce_sum_grad [30522L, 1024L] [] float16 89.81 43.44 2.07
op name case axise dtype 优化前 (us) 优化后 (us) speed up
reduce_mean_grad [-1L, 2048L, 33L, 33L] [2,3] float32 672 160.71 4.18
reduce_mean_grad [-1L, 2048L, 33L, 33L] [2,3] float16 681 83.1 8.19
reduce_mean_grad [-1L, 8L, 128L] [1] float32 3.115 1.72 1.81
reduce_mean_grad [-1L, 8L, 128L] [1] float16 3.171 1.66 1.91
reduce_mean_grad [30522L, 1024L] [] float32 152.78 138.83 1.10
reduce_mean_grad [30522L, 1024L] [] float16 134.96 69.835 1.93

benchmark 异常说明,与本次PR修改无关,本地测试无性能影响:

op name shape dtype dev (us) new (us) speed up
p_norm_1 backward [300, 128, 128], axis = -1 porder = 3.0 float32 178.26 178.06 1.00112322
matmul_3_backward [paddle][p_norm] p_norm {   run_tf: True   run_torch: True   axis: -1   porder: 3.0   keepdim: False   x_shape: [300, 128, 128]   x_dtype: float32   atol: 1e-06 [paddle][p_norm] p_norm {   run_tf: True   run_torch: True   axis: -1   porder: 3.0   keepdim: False   x_shape: [300, 128, 128]   x_dtype: float32   atol: 1e-06 float32 12.7867 12.865 0.99391372
matmul_9_forward [paddle][matmul] matmul {   run_tf: True   run_torch: True   atol: 1.0   transpose_x: False   transpose_y: False   x_shape: [4, 12, 64, 85]   x_dtype: float16   y_shape: [4, 12, 85, 512]   y_dtype: float16 } float32 26.165 26.146 1.00072669

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

xingfeng01
xingfeng01 previously approved these changes Jan 17, 2022
@AnnaTrainingG AnnaTrainingG changed the title Reduce EigenBroadcastcase with ElementwiseBroadcast in ReduceGrad Replace EigenBroadcastcase with ElementwiseBroadcast in ReduceGrad Jan 18, 2022
Copy link
Contributor

@zhiqiu zhiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for shareDataWith

Copy link
Contributor

@ZzSean ZzSean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for ci op benchmark

@AnnaTrainingG AnnaTrainingG changed the title Replace EigenBroadcastcase with ElementwiseBroadcast in ReduceGrad Replace EigenBroadcast with ElementwiseBroadcast in ReduceGrad Jan 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants