Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MKLDNN-DO Subgraph Optimization #966

Closed

Conversation

sreekanth-yalachigere
Copy link
Contributor

This optimization creates a subgraph of mkldnn operators and uses full potential of mkldnn by using block propagation and improves performance significantly.
For example, in Resnet50, currently we have 109 data reorders.

onnxruntime_data(nchw) ->reorder(nchw8c)->conv(nchw8c) -> reorder(nchw);
onnxruntime_data(nchw) ->batchnorm(nchw);
onnxruntime_data(nchw) ->Relu(nchw);
onnxruntime_data(nchw) ->reorder(nchw8c)->pool(nchw8c) -> reorder(nchw);
onnxruntime_data(nchw) ->reorder(nchw8c)->conv(nchw8c) -> reorder(nchw);

With this optimization, we will have only one data reorder and we propage blocked mkldnn memory all the way to the end of subgraph.

onnxruntime_data(nchw) -> conv(nchw8c) -> batchnorm-relu(nchw8c)->pool(nchw8c)->conv(nchw8c)... reorder(nchw)

Subgraph optimization can be enabled by setting the following environment variable.
ORT_MKLDNN_SUBGRAPH=1

@sreekanth-yalachigere sreekanth-yalachigere requested a review from a team as a code owner May 3, 2019 17:27
@jywu-msft
Copy link
Member

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 10 pipeline(s).

Copy link
Contributor

@pranavsharma pranavsharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Mkl and MKL are used inconsistently.
  • Put the MklKernel code in a cc file (not header).
  • Prefer using mkldnn instead of mkl everywhere to avoid confusion with mkl library.

onnxruntime/core/providers/mkldnn/subgraph/subgraph.h Outdated Show resolved Hide resolved
onnxruntime/core/providers/mkldnn/subgraph/subgraph.h Outdated Show resolved Hide resolved
ORT_UNUSED_PARAMETER(attributes_prefix);
}

virtual Status CreatePrimitives(const ONNXRunTimeTensor* input_tensors,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input_tensor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think input_tensors is correct. We access shape, dim and data by an index. Can you please confirm?

mkldnn::memory::dims src_dims(x_shape.GetDims().begin(), x_shape.GetDims().end());
std::string key;
key.reserve(128);
key = subgraph_key_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can AddDimsToKey be written such that it accepts a const string and returns the newly formed key? Moreover, we should rename this to GenerateKey() as addition is just one way to generate the key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pranavsharma AddDimsToKey is widely used in Vanilla mkldnn operators. Should I go ahead and modify it?

@jywu-msft
Copy link
Member

can you add some high level documentation, which describes the classes, abstractions and their relationships.
this would help maintenance of this code moving forward.

This optimization creates a subgraph of mkldnn operators and uses full potential of mkldnn by using block propagation and improves performance significantly.
For example, in Resnet50, currently we have 109 data reorders.

onnxruntime_data(nchw) ->reorder(nchw8c)->conv(nchw8c) -> reorder(nchw);
onnxruntime_data(nchw) ->batchnorm(nchw);
onnxruntime_data(nchw) ->Relu(nchw);
onnxruntime_data(nchw) ->reorder(nchw8c)->pool(nchw8c) -> reorder(nchw);
onnxruntime_data(nchw) ->reorder(nchw8c)->conv(nchw8c) -> reorder(nchw);

With this optimization, we will have only one data reorder and we propage blocked mkldnn memory all the way to the end of subgraph.

onnxruntime_data(nchw) -> conv(nchw8c) -> batchnorm-relu(nchw8c)->pool(nchw8c)->conv(nchw8c)... reorder(nchw)

Subgraph optimization can be enabled by setting the following environment variable.
ORT_MKLDNN_SUBGRAPH=1

}
virtual ~MklKernel(){};

virtual void ReadAttributes(const std::unordered_map<std::string,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to leverage OpNodeProtoHelper for reading attributes?
then you wouldn't need to define all the Get*Attr() methods below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into it and it's not straight forward. Can this be done after this PR?


namespace onnxruntime {

struct MklNode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's already onnxruntime representations of Node, subgraphs etc.
Were those too heavy to use? Would have been nice to leverage more of onnxruntime classes where possible,
rather than adding new ones to bridge onnxruntime and mkldnn library.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MklDnnNode is totally different and I am using it for MklDnn IR of a subgraph.

@yufenglee
Copy link
Member

I tried this PR on a production model and got a crash. Here is the call stack:

#0 0x000000000047e487 in std::__find_if<__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >, __gnu_cxx::__ops::_Iter_equals_val<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const> > (__first="Conv",
__last=<error: Cannot access memory at address 0x8>, __pred=...) at /usr/include/c++/5/bits/stl_algo.h:118
#1 0x00000000004935ab in std::__find_if<__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits, std::allocator >
, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >, __gnu_cxx::__ops::_Iter_equals_val<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const> > (__pred=..., __last=...,
__first=...) at /usr/include/c++/5/bits/stl_algo.h:162
#2 std::find<__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits, std::allocator >*, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > (__val=..., __last=..., _first=...) at /usr/include/c++/5/bits/stl_algo.h:3791
#3 onnxruntime::MKLDNNExecutionProvider::GetCapability (this=this@entry=0xe088f0, graph_viewer=..., kernel_registries=std::vector of length 1, capacity 1 = {...})
at /home/yufeng/project/mkldnn/onnxruntime/onnxruntime/core/providers/mkldnn/mkldnn_execution_provider.cc:167
#4 0x0000000000788391 in onnxruntime::GraphPartitioner::Partition (this=this@entry=0x7fffffffbf50, graph=..., export_dll=false, func_mgr=...) at /home/yufeng/project/mkldnn/onnxruntime/onnxruntime/core/framework/graph_partitioner.cc:151
#5 0x00000000004663ee in onnxruntime::InferenceSession::TransformGraph (this=this@entry=0xe07150, graph=..., graph_transformer_mgr=..., providers=..., kernel_registry_manager=..., insert_cast_transformer=..., session_state=...)
at /home/yufeng/project/mkldnn/onnxruntime/onnxruntime/core/session/inference_session.cc:301
#6 0x000000000046b7ce in onnxruntime::InferenceSession::Initialize (this=this@entry=0xe07150) at /home/yufeng/project/mkldnn/onnxruntime/onnxruntime/core/session/inference_session.cc:446
#7 0x0000000000450122 in OrtCreateSession (env=env@entry=0xe03ac0, model_path=0xcd9a50 "/home/yufeng/data/models/fastrcnn/perf/model.onnx", options=options@entry=0xe06cb0, out=out@entry=0x7fffffffd830)
at /home/yufeng/project/mkldnn/onnxruntime/onnxruntime/core/session/onnxruntime_c_api.cc:390
#8 0x0000000000421903 in onnxruntime::SessionOptionsWrapper::OrtCreateSession (model_path=, this=) at /home/yufeng/project/mkldnn/onnxruntime/include/onnxruntime/core/session/onnxruntime_cxx_api.h:127
#9 onnxruntime::perftest::OnnxRuntimeTestSession::OnnxRuntimeTestSession (this=0xdfe920, env=0xe03ac0, performance_test_config=..., m=0xdfdf20) at /home/yufeng/project/mkldnn/onnxruntime/onnxruntime/test/perftest/ort_test_session.cc:83
#10 0x0000000000423c41 in onnxruntime::perftest::CreateSession (test_model_info=0xdfdf20, performance_test_config
=..., env=0xe03ac0) at /home/yufeng/project/mkldnn/onnxruntime/onnxruntime/test/perftest/performance_runner.cc:188
#11 onnxruntime::perftest::PerformanceRunner::PerformanceRunner (this=0x7fffffffdf30, env=0xe03ac0, test_config=...) at /home/yufeng/project/mkldnn/onnxruntime/onnxruntime/test/perftest/performance_runner.cc:201
#12 0x0000000000420bbe in real_main (argc=, argv=, p_env=p_env@entry=0x7fffffffe160) at /home/yufeng/project/mkldnn/onnxruntime/onnxruntime/test/perftest/main.cc:33
#13 0x0000000000419615 in main (argc=, argv=) at /home/yufeng/project/mkldnn/onnxruntime/onnxruntime/test/perftest/main.cc:53

while (node_index < graph_viewer.MaxNodeIndex()) {
auto node = graph_viewer.GetNode(node_index);
std::vector<std::string>::iterator it = std::find(
mkl_ops.begin(), mkl_ops.end(), node->OpType());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the crash is because node is nullptr. Need to figure out why node is null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yufenglee is this PR model available on ONNX Model Zoo?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sreekanth-yalachigere , no, it is a production model and I can't share it with you for confidentiality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetNode() may return nullptr if the node has been freed , which can happen if there are some transforms.
I don't think you can iterate through nodes by index from 0 to max.
Shouldn't you use GetNodesInTopologicalOrder() ?

@sreekanth-yalachigere
Copy link
Contributor Author

@pranavsharma @jywu-msft

  1. Changed Mkl to MklDnn
  2. Renamed files (mkl_* to mkldnn_*)
  3. Moved code from mkldnn_kernel.h to new mkldnn_kernel.cc
  4. Extracted method from GetCapability long code.

@sreekanth-yalachigere
Copy link
Contributor Author

Creating new PR as many files have modified after memcpy code merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants