-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[oneDNN] Accesses to oneDNN cache optimized for conv2d #33048
Conversation
- First draft implemented - compilable and UT not crashing (still failing) - Fix to UT cache.vim
Thanks for your contribution! |
UpdatePaddingAndDilation(&paddings, &dilations, padding_algorithm, | ||
data_dims, strides, ksize); | ||
|
||
auto src_tz = paddle::framework::vectorize(in->dims()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto src_tz = paddle::framework::vectorize(in->dims()); | |
auto src_tz = framework::vectorize(in->dims()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
data_dims, strides, ksize); | ||
|
||
auto src_tz = paddle::framework::vectorize(in->dims()); | ||
auto weights_tz = paddle::framework::vectorize(filter->dims()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto weights_tz = paddle::framework::vectorize(filter->dims()); | |
auto weights_tz = framework::vectorize(filter->dims()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
||
int g = std::max(groups, 1); | ||
platform::GetGroupConvWeightsTz(weights_tz, g); | ||
auto dst_tz = paddle::framework::vectorize(out_grad->dims()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto dst_tz = paddle::framework::vectorize(out_grad->dims()); | |
auto dst_tz = framework::vectorize(out_grad->dims()); |
* ('any') which lets a primitive (conv backward in this case) choose | ||
* the memory format preferred for best performance | ||
*/ | ||
auto chosen_memory_format = MKLDNNMemoryFormat::any; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto chosen_memory_format = MKLDNNMemoryFormat::any; | |
constexpr auto chosen_memory_format = MKLDNNMemoryFormat::any; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is simple assignment and since chosen_memory_format is not modified later we can just declare it as const
platform::GetGroupConvWeightsTz(weights_tz, g); | ||
auto dst_tz = paddle::framework::vectorize(out_grad->dims()); | ||
|
||
MKLDNNMemoryFormat weights_format = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you setting this value here, when you are always changing the format to "any" in line 332?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
conv_attr, fwd_prop_kind, dnnl::algorithm::convolution_direct, | ||
src_md, weights_md, dst_md, stride_dims, dilations_dims, | ||
mkldnn_paddings[0], mkldnn_paddings[1]); | ||
} | ||
} | ||
} | ||
|
||
ConvMKLDNNHandlerT(const framework::ExecutionContext& ctx, | ||
const paddle::platform::MKLDNNDeviceContext& dev_ctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const paddle::platform::MKLDNNDeviceContext& dev_ctx, | |
const platform::MKLDNNDeviceContext& dev_ctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ConvMKLDNNHandlerT(const framework::ExecutionContext& ctx, | ||
const platform::MKLDNNDeviceContext& dev_ctx, | ||
platform::Place cpu_place, const Tensor* in, | ||
const Tensor* filter, const Tensor* bias, | ||
const Tensor* out_grad, Tensor* filter_grad, | ||
Tensor* in_x_grad, const std::string& unique_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for consideration: couldn't we have separate classess for FWD and BWD ? Afterall we are going to recreate fwd primitive in BWD pass. Choosing the right pass by passing appropriate arguments to same class constructor seems to me a bit confusing and not clear right away from the user perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's discuss this internally first
auto backward_p = | ||
std::static_pointer_cast<TBackward_params>(dev_ctx_.GetBlob(key_p)); | ||
if (backward_p == nullptr) { | ||
backward_p = std::make_shared<TBackward_params>(*bwd_w_pd_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you should check here also whether bwd_w_pd_
is not empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
framework::Tensor* diff_weights) { | ||
T* ptr = diff_weights->mutable_data<T>( | ||
place_, bwd_w_pd_->diff_weights_desc().get_size()); | ||
return this->AcquireMemoryFromPrimitive(bwd_w_pd_->diff_weights_desc(), ptr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar as above, shouldn't you check whether bwd_w_pd_
is already created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
||
// Buffer is allocated by oneDNN to store computation results | ||
std::shared_ptr<mkldnn::memory> AcquireDiffWeightsMemory(void) { | ||
return this->AcquireMemoryFromPrimitive(bwd_w_pd_->diff_weights_desc(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -35,7 +35,8 @@ using user_function = std::function<std::shared_ptr<float>(const float*)>; | |||
using memory = mkldnn::memory; | |||
|
|||
template <typename T, typename TForward, | |||
typename TBackward = mkldnn_dummy_primitive> | |||
typename TBackward = mkldnn_dummy_primitive, | |||
typename TBackward_params = mkldnn_dummy_primitive> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is only Convolution specific then wouldn't it be more readable to have it in its own handler class? Moreover what if we would like to reuse this class for other operations (like recurrent RNN, GRU, LSTM) which may require even more additional parameter types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not only convolution specific. It is specific to all primitives that are having trainable weights e.g. conv, conv_transpose, fully connected, prelu... As for LSTM (when we get to training) I'm not sure perhaps it would make sense to do some separate class
@@ -72,6 +73,17 @@ class MKLDNNHandlerT { | |||
return backward_p; | |||
} | |||
|
|||
std::shared_ptr<TBackward_params> AcquireBackwardWeightsPrimitive() { | |||
const std::string key_p = key_ + "@bwd_w_p"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we have those suffixes like "@bwd_w_p"
in some form of enums?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should . What we actully need is suffixes as enum and then mapping mechanism "key -> string" to have it printed as part of VLOG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job! :)
@luotao1 could you please start your review? |
PR types
Function optimization
PR changes
OPs
Describe
In similar spirit as PR #32922 conv2d BWD can recreate FWD PD, so we used functions AcquireForwardPrimitiveNonBlocking. Apart from that Refacgtoring of conv2d grad was made to have optimizations done.