Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Profiling] Update elementwise op #6229

Merged

Conversation

chengduoZH
Copy link
Contributor

@chengduoZH chengduoZH commented Dec 4, 2017

Fix #6166

"We found the performance of Eigen::Tensor::broadcast is not good if the Tensor is not just a scalar, because the Eigen::Tensor::broadcast try to implement a very general broadcast method. However, the most of broadcast operators in the neural network is rowwise or colwise. The implementation of rowwise or colwise is much simpler and faster than a general implementation.
The Eigen::Tensor::broadcast is mainly implmeneted by this function." —— @reyoung

@chengduoZH chengduoZH changed the title [Profiling] Updata elementwise op [Profiling] Update elementwise op Dec 4, 2017
@chengduoZH chengduoZH force-pushed the profiling/updata_elementwise_op branch from 2a5fb74 to ff4c825 Compare December 4, 2017 15:05
@chengduoZH chengduoZH force-pushed the profiling/updata_elementwise_op branch 3 times, most recently from 18a50a2 to cc210de Compare December 5, 2017 02:03
Copy link
Collaborator

@reyoung reyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! There are some tiny enhancements and one typo.

: ptr_(ptr), i_(0), j_(0), n_(n), post_(post) {}

MidWiseTransformIterator<T, platform::CPUPlace>& operator++() {
i_ = ++j_ / post_ % n_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add parentheses here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


bool operator!=(
const MidWiseTransformIterator<T, platform::CPUPlace>& rhs) const {
return (ptr_ + i_) &= &(*rhs);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&= -> !=

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

typedef thrust::iterator_adaptor<
RowwiseTransformIterator<T, platform::GPUPlace>, const T*>
super_t;
__host__ __device__ RowwiseTransformIterator(const T* x, int n)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use HOSTDEVICE macro

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

private:
unsigned int n_;
const T* begin_;
__host__ __device__ typename super_t::reference dereference() const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use HOSTDEVICE macro

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

template <typename Place, typename T>
class ElementwiseAddKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
ElementwiseCompute<EigenAddFunctor, Place, T>(ctx);
using Tensor = framework::Tensor;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should implement all elemwise operators based on this method.

Not in hurry, but need an issue to record the following works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do these in next PR.

template <typename Functor, typename T, typename Place>
struct TransformFunctor {
TransformFunctor(const framework::Tensor* x, const framework::Tensor* y,
framework::Tensor* z, const framework::ExecutionContext& ctx,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use DeviceContext other than ExecutionContext

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@chengduoZH chengduoZH force-pushed the profiling/updata_elementwise_op branch 3 times, most recently from 830dbaa to 1517889 Compare December 5, 2017 03:47
@chengduoZH chengduoZH force-pushed the profiling/updata_elementwise_op branch from 1517889 to 9e244a8 Compare December 5, 2017 03:47
T* z_;
int64_t nx_;
const platform::DeviceContext& ctx_;
Functor func_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming rules of data members for struct is different from class.

https://google.github.io/styleguide/cppguide.html#Variable_Names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I have replaced struct with class.

Copy link
Collaborator

@reyoung reyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent

@chengduoZH chengduoZH merged commit 3644446 into PaddlePaddle:develop Dec 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants