-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Profiling] Update elementwise op #6229
[Profiling] Update elementwise op #6229
Conversation
2a5fb74
to
ff4c825
Compare
18a50a2
to
cc210de
Compare
cc210de
to
54f0962
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent! There are some tiny enhancements and one typo.
: ptr_(ptr), i_(0), j_(0), n_(n), post_(post) {} | ||
|
||
MidWiseTransformIterator<T, platform::CPUPlace>& operator++() { | ||
i_ = ++j_ / post_ % n_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add parentheses here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
bool operator!=( | ||
const MidWiseTransformIterator<T, platform::CPUPlace>& rhs) const { | ||
return (ptr_ + i_) &= &(*rhs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
&=
-> !=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
typedef thrust::iterator_adaptor< | ||
RowwiseTransformIterator<T, platform::GPUPlace>, const T*> | ||
super_t; | ||
__host__ __device__ RowwiseTransformIterator(const T* x, int n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use HOSTDEVICE
macro
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
private: | ||
unsigned int n_; | ||
const T* begin_; | ||
__host__ __device__ typename super_t::reference dereference() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use HOSTDEVICE
macro
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
template <typename Place, typename T> | ||
class ElementwiseAddKernel : public framework::OpKernel<T> { | ||
public: | ||
void Compute(const framework::ExecutionContext& ctx) const override { | ||
ElementwiseCompute<EigenAddFunctor, Place, T>(ctx); | ||
using Tensor = framework::Tensor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should implement all elemwise operators based on this method.
Not in hurry, but need an issue to record the following works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do these in next PR.
template <typename Functor, typename T, typename Place> | ||
struct TransformFunctor { | ||
TransformFunctor(const framework::Tensor* x, const framework::Tensor* y, | ||
framework::Tensor* z, const framework::ExecutionContext& ctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to use DeviceContext other than ExecutionContext
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
830dbaa
to
1517889
Compare
1517889
to
9e244a8
Compare
T* z_; | ||
int64_t nx_; | ||
const platform::DeviceContext& ctx_; | ||
Functor func_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming rules of data members for struct
is different from class
.
https://google.github.io/styleguide/cppguide.html#Variable_Names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I have replaced struct
with class
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent
Fix #6166
"We found the performance of
Eigen::Tensor::broadcast
is not good if theTensor
is not just a scalar, because theEigen::Tensor::broadcast
try to implement a very generalbroadcast
method. However, the most ofbroadcast
operators in the neural network isrowwise
orcolwise
. The implementation ofrowwise
orcolwise
is much simpler and faster than a general implementation.The
Eigen::Tensor::broadcast
is mainly implmeneted by this function." —— @reyoung