-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support logical functors and refine tensor packing function #33089
Support logical functors and refine tensor packing function #33089
Conversation
Thanks for your contribution! |
*/ | ||
template <typename OutT> | ||
int PackTensorsIntoVector(const framework::ExecutionContext &ctx, | ||
std::vector<const framework::Tensor *> *ins, | ||
std::vector<framework::Tensor *> *outs) { | ||
std::vector<framework::Tensor *> *outs, | ||
framework::Tensor *x_ptr = nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 参数叫x_ptr会引起疑惑。比如叫
x_for_selectedrows
这样的,比较直观。函数解释里面说明一下,输入x可以是LoDTensor或SelectedRows。当x是SelectedRows,需要先转换成LoDTensor参与计算,此时需要在op kernel中额外定义一个临时的tensor,并传入x_for_selectedrows
参数。 - elementwise_mul_op.h里面,
ElementwiseMulKernel
也改下。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的,下个commit 修改完成
auto *y = ctx.Input<framework::LoDTensor>("Y"); | ||
auto *z = ctx.Output<framework::LoDTensor>("Out"); | ||
|
||
if (x_ptr == nullptr || x_var->IsType<framework::LoDTensor>()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里条件应该是if (x_var->IsType<framework::LoDTensor>)
就够了,就算传入了x_ptr也是没用的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
确实显得画蛇添足了,下个commit修改掉
z = ctx.Output<framework::LoDTensor>("Out"); | ||
ins->emplace_back(x); | ||
x_dims_size = x->dims().size(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
觉得这里加空行不太好看。。。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
下个commit 删除
platform::errors::InvalidArgument( | ||
"For elementwise_op, if X is Sparse, Y must be " | ||
"scalar. But reveived the size of Y = %d.", | ||
y->dims().size())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里要PADDLE_ENFORCE
检查下x_ptr
不为null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
按照建议修改
z = ctx.Output<framework::SelectedRows>("Out")->mutable_value(); | ||
ins->emplace_back(x_ptr); | ||
x_dims_size = x_ptr->dims().size(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上,这里不要加空行。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
下个commit 删除
|
||
if (y != nullptr) { | ||
ins->emplace_back(y); | ||
axis = ctx.HasAttr("axis") ? ctx.Attr<int>("axis") : -1; | ||
axis = axis == -1 ? std::abs(y->dims().size() - x->dims().size()) : axis; | ||
axis = axis == -1 ? std::abs(y->dims().size() - x_dims_size) : axis; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
其实我原来的review建议,只是把axis==-1
时的这个换算放到LaunchElementwiseCudaKernel
里面,就这一行代码。如果axis==-1
只适用于二元、broadcast的情况,那就放到LaunchBroadcastElementwiseCudaKernel
里面,感觉代码能简化一些?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我原来一直会错意了... 之前以为是将axis的整体计算全部搬到LaunchBroadcastElementwiseCudaKernel
里面。搬到LaunchBroadcastElementwiseCudaKernel
里面确实更合适一些,代码的通用性也更强。按照comment的思路修改的话,可能最终的代码成果是下面这样的,或者其他类似形式。
void LaunchBroadcastElementwiseCudaKernel(cuda_ctx, ins, outs , axis, func) {
std::vector<int> dims_size;
bool no_broadcast_flag = true;
for (auto *in : ins) {
no_broadcast_flag = ins[0]->dims() == in->dims();
dims_size.emplace_back(in->dims().size());
}
if (no_broadcast_flag) {
LaunchSameDimsElementwiseCudaKernel<ET, InT, OutT>(
cuda_ctx, ins, outs, func);
} else {
axis = axis == -1
? *std::max_element(dims_size.begin(), dims_size.end()) -
*std::min_element(dims_size.begin(), dims_size.end()) : axis;
LaunchBroadcastElementwiseCudaKernel<ET, InT, OutT>(cuda_ctx, ins, outs,
axis, func);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Performance optimization
PR changes
OPs
Describe
not
and
Or
Xor