-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Pten]Refactor the Elementwise_add Kernel #37043
[Pten]Refactor the Elementwise_add Kernel #37043
Conversation
Thanks for your contribution! |
int axis = PackTensorsIntoVector<T>(ctx, &ins, &outs); | ||
LaunchElementwiseCudaKernel<ElementwiseType::kBinary, T, T>( | ||
cuda_ctx, ins, &outs, axis, AddFunctor<T>()); | ||
auto* x = ctx.Input<framework::LoDTensor>("X"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个函数是不是可以移除了,直接复用下面paddle/fluid/operators/elementwise/elementwise_add_op.h中的kernel,在注册时,给cuda注册一份就可以
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -20,6 +20,13 @@ limitations under the License. */ | |||
#include "paddle/fluid/operators/math/blas.h" | |||
#include "paddle/fluid/operators/math/math_function.h" | |||
|
|||
#include "paddle/fluid/framework/pten_utils.h" | |||
|
|||
// only can include the headers in paddle/top/api dirs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
注释目录需要更改,现在应该是paddle/pten/include
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -29,6 +29,11 @@ limitations under the License. */ | |||
#include "paddle/fluid/platform/gpu_info.h" | |||
#include "paddle/fluid/platform/transform.h" | |||
|
|||
// only can include the headers in paddle/top/api dirs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
(*post) *= x_dims[i]; | ||
} | ||
pten::general::get_mid_dims(x_dims, y_dims, axis, pre, n, post, | ||
is_run_common_broadcast); | ||
} | ||
|
||
inline int GetElementwiseIndex(const int *x_dims_array, const int max_dim, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这层壳的保留是必要的吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
std::vector<const pten::DenseTensor *> pt_inputs; | ||
std::vector<pten::DenseTensor *> pt_outputs; | ||
// *_tmp for cache DenseTensor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以加TODO注释说明是DenseTensor暂不支持拷贝构造导致需要这样写?后续会优化?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -3,3 +3,5 @@ cc_library(linalg_cpu SRCS linalg.cc DEPS dense_tensor kernel_context kernel_fac | |||
cc_library(creation_cpu SRCS creation.cc DEPS dense_tensor kernel_context kernel_factory eigen_function) | |||
cc_library(utils_cpu SRCS utils.cc DEPS dense_tensor kernel_context kernel_factory memory convert_utils) | |||
cc_library(manipulation_cpu SRCS manipulation.cc DEPS dense_tensor kernel_context kernel_factory utils_cpu unary) | |||
cc_library(nn_cpu SRCS nn.cc DEPS dense_tensor kernel_context kernel_factory blas eigen_function) | |||
add_subdirectory(funcs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以将原来的functions改为funcs,是不是不在每个目录下新建funcs比较好
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -4,10 +4,12 @@ if(WITH_GPU) | |||
nv_library(creation_cuda SRCS creation.cu DEPS eigen_function dense_tensor kernel_context kernel_factory) | |||
nv_library(utils_cuda SRCS utils.cu DEPS dense_tensor kernel_context kernel_factory memory convert_utils) | |||
nv_library(manipulation_cuda SRCS manipulation.cu DEPS dense_tensor kernel_context kernel_factory utils_cuda unary) | |||
elseif(WITH_ROCM) | |||
nv_library(nn_cuda SRCS nn.cu DEPS dense_tensor kernel_context kernel_factory eigen_function) | |||
elseif(WITH_ROCM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
缩进有点问题
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
int64_t post_; | ||
}; | ||
|
||
// #if defined(__NVCC__) || defined(__HIPCC__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些注释是什么情况
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
namespace general { | ||
|
||
using DDim = paddle::framework::DDim; | ||
using CPUContext = paddle::platform::CPUDeviceContext; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里用的cpu,为什么是在general下面
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是cpu和gpu的公共方法,所以放在里general下边
|
||
#pragma once | ||
|
||
#include "paddle/pten/kernels/cuda/funcs/elementwise/elementwise_broadcast.cu.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上,要不在functions下面新增cpu和cuda目录吧,不在kernel层的设备目录里新增子目录了,后续这里也还要调整,另外如果都在cuda目录下,是不是文件后缀中的cu也可以去掉了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
paddle/pten/api/lib/nn.cc
Outdated
namespace paddle { | ||
namespace experimental { | ||
|
||
Tensor elementwise_add(const Tensor& x, const Tensor& y, int axis) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API中好像没有aixs这个参数
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -2020,7 +1856,8 @@ void FusedElemwiseAndActComputeWithBroadcast( | |||
axis = (y_dim.size() == 0) ? x_dim.size() : axis; | |||
|
|||
int pre, n, post, is_run_common_broadcast; | |||
get_mid_dims(x_dim, y_dim, axis, &pre, &n, &post, &is_run_common_broadcast); | |||
pten::general::get_mid_dims(x_dim, y_dim, axis, &pre, &n, &post, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
函数命名建议按照code style统一,采用驼峰式命名,这里原先就不太规范,可在后续PR更改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for PR-CI-OP-benchmark
PR types
Others
PR changes
OPs
Describe
相关背景
需要迁移elementwise系列kernel到新的Tensor计算库下,其中elementwise_add是迁移的第一个该系列kernel。该PR主要迁移了elementwise_add计算逻辑以及大部分elementwise系列kernel依赖的基本组件,涉及到了较多的文件,构成了后续其他elementwise迁移的基础。
迁移主要逻辑
1,elementwise基础组件迁移
2,elementwise_add kernel迁移