Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[oneDNN] Accesses to oneDNN cache optimized for conv2d #33048

Merged
merged 10 commits into from
May 27, 2021

Conversation

jczaja
Copy link
Contributor

@jczaja jczaja commented May 21, 2021

PR types

Function optimization

PR changes

OPs

Describe

In similar spirit as PR #32922 conv2d BWD can recreate FWD PD, so we used functions AcquireForwardPrimitiveNonBlocking. Apart from that Refacgtoring of conv2d grad was made to have optimizations done.

- First draft implemented

- compilable and UT not crashing (still failing)

- Fix to UT
	cache.vim
@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@jczaja jczaja added the Intel label May 21, 2021
UpdatePaddingAndDilation(&paddings, &dilations, padding_algorithm,
data_dims, strides, ksize);

auto src_tz = paddle::framework::vectorize(in->dims());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto src_tz = paddle::framework::vectorize(in->dims());
auto src_tz = framework::vectorize(in->dims());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

data_dims, strides, ksize);

auto src_tz = paddle::framework::vectorize(in->dims());
auto weights_tz = paddle::framework::vectorize(filter->dims());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto weights_tz = paddle::framework::vectorize(filter->dims());
auto weights_tz = framework::vectorize(filter->dims());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


int g = std::max(groups, 1);
platform::GetGroupConvWeightsTz(weights_tz, g);
auto dst_tz = paddle::framework::vectorize(out_grad->dims());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto dst_tz = paddle::framework::vectorize(out_grad->dims());
auto dst_tz = framework::vectorize(out_grad->dims());

* ('any') which lets a primitive (conv backward in this case) choose
* the memory format preferred for best performance
*/
auto chosen_memory_format = MKLDNNMemoryFormat::any;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto chosen_memory_format = MKLDNNMemoryFormat::any;
constexpr auto chosen_memory_format = MKLDNNMemoryFormat::any;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simple assignment and since chosen_memory_format is not modified later we can just declare it as const

platform::GetGroupConvWeightsTz(weights_tz, g);
auto dst_tz = paddle::framework::vectorize(out_grad->dims());

MKLDNNMemoryFormat weights_format =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you setting this value here, when you are always changing the format to "any" in line 332?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

conv_attr, fwd_prop_kind, dnnl::algorithm::convolution_direct,
src_md, weights_md, dst_md, stride_dims, dilations_dims,
mkldnn_paddings[0], mkldnn_paddings[1]);
}
}
}

ConvMKLDNNHandlerT(const framework::ExecutionContext& ctx,
const paddle::platform::MKLDNNDeviceContext& dev_ctx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const paddle::platform::MKLDNNDeviceContext& dev_ctx,
const platform::MKLDNNDeviceContext& dev_ctx,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@jakpiase jakpiase self-requested a review May 24, 2021 17:31
jakpiase
jakpiase previously approved these changes May 24, 2021
Copy link
Contributor

@jakpiase jakpiase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +244 to +249
ConvMKLDNNHandlerT(const framework::ExecutionContext& ctx,
const platform::MKLDNNDeviceContext& dev_ctx,
platform::Place cpu_place, const Tensor* in,
const Tensor* filter, const Tensor* bias,
const Tensor* out_grad, Tensor* filter_grad,
Tensor* in_x_grad, const std::string& unique_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for consideration: couldn't we have separate classess for FWD and BWD ? Afterall we are going to recreate fwd primitive in BWD pass. Choosing the right pass by passing appropriate arguments to same class constructor seems to me a bit confusing and not clear right away from the user perspective.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's discuss this internally first

auto backward_p =
std::static_pointer_cast<TBackward_params>(dev_ctx_.GetBlob(key_p));
if (backward_p == nullptr) {
backward_p = std::make_shared<TBackward_params>(*bwd_w_pd_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you should check here also whether bwd_w_pd_ is not empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

framework::Tensor* diff_weights) {
T* ptr = diff_weights->mutable_data<T>(
place_, bwd_w_pd_->diff_weights_desc().get_size());
return this->AcquireMemoryFromPrimitive(bwd_w_pd_->diff_weights_desc(), ptr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar as above, shouldn't you check whether bwd_w_pd_ is already created?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


// Buffer is allocated by oneDNN to store computation results
std::shared_ptr<mkldnn::memory> AcquireDiffWeightsMemory(void) {
return this->AcquireMemoryFromPrimitive(bwd_w_pd_->diff_weights_desc(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -35,7 +35,8 @@ using user_function = std::function<std::shared_ptr<float>(const float*)>;
using memory = mkldnn::memory;

template <typename T, typename TForward,
typename TBackward = mkldnn_dummy_primitive>
typename TBackward = mkldnn_dummy_primitive,
typename TBackward_params = mkldnn_dummy_primitive>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only Convolution specific then wouldn't it be more readable to have it in its own handler class? Moreover what if we would like to reuse this class for other operations (like recurrent RNN, GRU, LSTM) which may require even more additional parameter types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not only convolution specific. It is specific to all primitives that are having trainable weights e.g. conv, conv_transpose, fully connected, prelu... As for LSTM (when we get to training) I'm not sure perhaps it would make sense to do some separate class

@@ -72,6 +73,17 @@ class MKLDNNHandlerT {
return backward_p;
}

std::shared_ptr<TBackward_params> AcquireBackwardWeightsPrimitive() {
const std::string key_p = key_ + "@bwd_w_p";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we have those suffixes like "@bwd_w_p" in some form of enums?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should . What we actully need is suffixes as enum and then mapping mechanism "key -> string" to have it printed as part of VLOG.

Copy link
Contributor

@jakpiase jakpiase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@arogowie-intel arogowie-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! :)

@jczaja
Copy link
Contributor Author

jczaja commented May 26, 2021

@luotao1 could you please start your review?

@luotao1 luotao1 merged commit 8c6bbb4 into PaddlePaddle:develop May 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants