-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize the computation of log_softmax #40612
Conversation
Thanks for your contribution! |
@@ -293,9 +288,10 @@ __global__ void WarpSoftmaxForward(T* softmax, | |||
} | |||
|
|||
// data src | |||
AccT srcdata_raw[kBatchSize][kLoopsV][kVSize]; | |||
AccT srcdata[kBatchSize][kLoopsV][kVSize]; | |||
T src_tmp[kBatchSize][kLoopsV][kVSize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这样弄了3个寄存器数组,srcdata_raw
、srcdata
、src_tmp
,含义从变量名很难区分。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
修改了变量名,并添加了注释
|
||
HOSTDEVICE explicit inline ExpSubFunctor(Tx y) : y((Tx)(y)) {} | ||
struct ExpFunctor { | ||
HOSTDEVICE explicit inline ExpFunctor() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个默认构造函数不需要写了吧?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已删除
* Optimize the computation of log_softmax * modify the var name
PR types
Performance optimization
PR changes
OPs
Describe
Optimize the computation of log_softmax。
合入#38992后,部分case性能有回退,原因是在softmax的基础上实现log softmax,存在重复计算,优化完计算后,性能与之前持平