-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【PaddlePaddle Hackathon 4 No.34】为 Paddle 优化 Lerp OP 在 GPU 上的性能 #53154
【PaddlePaddle Hackathon 4 No.34】为 Paddle 优化 Lerp OP 在 GPU 上的性能 #53154
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
❌ The PR is not created using PR's template. You can refer to this Demo. |
b900b2a
to
09c0042
Compare
"The number of dimensions for LerpOp must be " | ||
"greater than or equal to 0, but the value received is %d.", | ||
rank)); | ||
PADDLE_ENFORCE_LE( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为采用的核心计算是BroadcastKernel,内置了一些判断规则,不必沿用这里的 rank <= 6
的设定,这个设定是为Eigen服务的,可以删除掉.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
broadcast_min_functor); | ||
inputs.emplace_back(&x); | ||
inputs.emplace_back(&b_min); | ||
inputs.emplace_back(&weight); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这部分的计算逻辑我理解是对输入的数据首先将维度按照out_tensor
的维度进行补齐,然后再调用一次BroadcastKernel
。BroadcastKernel
内置了一套逻辑,可以直接的对维度信息进行补齐,唯一需要主义的就是设定补齐的axis 轴即可,不需要分两次调用,关于维度补齐中axis的设置,可以参考numpy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
老师您好!非常感谢您的指点!
请问您的意思,是设置合适axis的值调用BroadcastKernel就可以实现纬度对齐+运算的功能,还是先预处理数据实现纬度对齐再调用BroadcastKernel实现运算呀?
如果是前者,这里有一种特殊情况我无法处理。
在使用BroadcastKernel时,如果参数ET为ElementwiseType::kBinary、ins中的三个tensor的维度各不相同,不管axis参数的值为多少,因为参数axis是一个数,ins中总有一个tensor不能正常broadcast。
y将不能正常broadcast。
查看paddle/phi/kernels/funcs/dims_simplifier.h
的ExtendInputDimensions
函数可知,ins中的tensor是根据outs[0]一个个进行broadcast的。以上面的例子为例,假如axis为1,则x不能正常broadcast;假如axis为2,则y不能正常broadcast。
是我调用错了Kernel吗?
如果是后者,请问对于预处理部分,有什么可以参考的代码吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
从PM同学那里听说你对我的这部分修改建议持否定态度,请问下理由是什么吗?如果理由OK的话,我这边会合入的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JamesLim-sy 老师您好!您误会了,我不是持否定的态度哈。我是遇到了自己难以解决的困难,向您寻求一下进一步的指导。
您提出不需要分两次调用调用BroadcastKernel,我是很赞同的,我最初也是那样写的,但是在测试遇到了问题(具体内容可以见上面的回复)。我做过了一些别的尝试,但是都失败了,最终选择了对于特殊情况分两次调用BroadcastKernel。
您有更好的解决方法吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我的意思是,Paddle的Broadcast计算,支持 (input_0.broadcast + input_1.broadcast + input_2.broadcast) = (output_0, output_1) 这种计算模式,不必先单独broaddcast::kUnary ,再执行计算的。可以本地先测试下通用一次性的BoradcastTenery 完成计算.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JamesLim-sy 您的意思,是用一次多输出的Broadcast::kTernary替换掉一次单输出的Broadcast:::kUnary+一次单输出的Broadcast::kTernary吗?如果是这样的话,我查看源码之后发现并不可行。
从'/paddle/phi/kernels/funcs/broadcast_function.h'的49行代码可以看出,多输出的情况下,要求各个输出的dims()相同。从'/paddle/phi/kernels/funcs/broadcast_function.h'的974行代码可以看出,多输出的情况下,各个输入的broadcast过程是由(*outs)[0]->dims()与int型参数axis决定的,这与单输出的Broadcast::kTernary的broadcast过程是完全相同的,这也意味着也会出现上述所说的问题。
另外之前CI中所有Required的部分都过了,但是现在paddle-ci-bot显示'Sorry to inform you that 81c86105e84f03cbc635fc247e050a20da1d96b1's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.',我重新构建失败的部分,也不能成功QAQ,这是哪里的原因呀?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JamesLim-sy 老师您好,麻烦您再看一下。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JamesLim-sy 呜呜呜,等好久了,您抽空再审核下吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
修改完毕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JamesLim-sy mingshu老师有时间review一下吗?
Sorry to inform you that 81c8610's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
fix some CI issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
#include "paddle/phi/kernels/funcs/broadcast_function.h" | ||
#include "paddle/phi/kernels/funcs/common_shape.h" | ||
#include "paddle/phi/kernels/funcs/math_function.h" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "paddle/phi/kernels/empty_kernel.h"
#include "paddle/phi/kernels/funcs/broadcast_function.h"
#include "paddle/phi/kernels/funcs/common_shape.h"
#include "paddle/phi/kernels/funcs/math_function.h"
这几个头文件都裹在#include "paddle/phi/kernels/funcs/broadcast_function.h"
里面了,之后希望能再提一个PR修改掉.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
PR types
Performance optimization
PR changes
OPs
Description
目前 Paddle 内 lerp 算子采用第三方库组合实现,性能不足。可以基于飞桨内部的Broadcast Kernel实现良好的优化效果。
设计文档:PaddlePaddle/community#513
完成优化后,Paddle与优化前的Paddle的前向推理性能对比效果:
可以看到,平均性能至少提升了20%,对于性能差的case,性能提升到了原先的5倍。经过优化,性能得到了较大的提升。