-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NPUW] Add Slice before last MatMul #27229
[NPUW] Add Slice before last MatMul #27229
Conversation
|
||
SliceLastMatmulAdd::SliceLastMatmulAdd() { | ||
auto matmul = opp::wrap_type<ov::op::v0::MatMul>({opp::any_input(), opp::any_input()}); | ||
auto add = opp::wrap_type<ov::op::v1::Add>({matmul, opp::any_input()}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if all those additional patterns could be simplified with optional
nodes
7dab165
to
93afd14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work! @slyalin can you please comment on my concerns?
auto add = opp::wrap_type<ov::op::v1::Add>({matmul, opp::any_input()}); | ||
auto res = opp::wrap_type<ov::op::v0::Result>({add}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I've seen cases like this. @TolyaTalamanov did you?
@smirnov-alexey what pattern worked for you most of the time?
I think this pattern we can drop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
phi2: https://huggingface.co/OpenVINO/phi-2-fp16-ov/raw/main/openvino_model.xml has Add
after MatMul
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only saw the first one, but as discussed let's keep all 3 for now (enabled via property)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the next patterns and applied them in genai repo:
chatglm2-6b
chatglm3-6b
Matmul -> Transpose -> Result
codegen2-1b
codegen2-3_7b
codegen2-7b
gpt-j-6b
phi-2
Matmul -> Add -> Result
gemma-2-2b
gemma-2-9b
MatMul -> Divide -> Tanh -> Multiply -> Result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @olpipi, appreciate your help
SliceLastMatmulTranspose::SliceLastMatmulTranspose() { | ||
auto matmul = opp::wrap_type<ov::op::v0::MatMul>({opp::any_input(), opp::any_input()}); | ||
auto add = opp::wrap_type<ov::op::v1::Transpose>({matmul, opp::any_input()}); | ||
auto res = opp::wrap_type<ov::op::v0::Result>({matmul}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not sure about this one. My main concern is that we can alter more matmuls than we actually need..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, keeping for now
SliceLastMatmulMultiply::SliceLastMatmulMultiply() { | ||
auto matmul = opp::wrap_type<ov::op::v0::MatMul>({opp::any_input(), opp::any_input()}); | ||
auto div = opp::wrap_type<ov::op::v1::Divide>({matmul, opp::any_input()}); | ||
auto tanh = opp::wrap_type<ov::op::v0::Tanh>({div}); | ||
auto multiply = opp::wrap_type<ov::op::v1::Multiply>({tanh, opp::any_input()}); | ||
auto res = opp::wrap_type<ov::op::v0::Result>({multiply}); | ||
|
||
// Note: Use [=] to make sure the above objects stay alive in the callback | ||
auto callback = [=](ov::pass::pattern::Matcher& m) { | ||
auto& node_to_output = m.get_pattern_value_map(); | ||
|
||
auto matched_out_matmul = node_to_output.at(matmul); | ||
|
||
auto shape = matched_out_matmul.get_node()->input(0).get_shape(); | ||
|
||
if (shape.size() == 3 && shape[1] > 1) { | ||
auto start = std::make_shared<ov::op::v0::Constant>(ov::element::i32, | ||
ov::Shape{3}, | ||
std::vector<int32_t>{0, int32_t(shape[1] - 1), 0}); | ||
auto stop = | ||
std::make_shared<ov::op::v0::Constant>(ov::element::i32, | ||
ov::Shape{3}, | ||
std::vector<int32_t>{1, int32_t(shape[1]), int32_t(shape[2])}); | ||
auto step = | ||
std::make_shared<ov::op::v0::Constant>(ov::element::i32, ov::Shape{3}, std::vector<int32_t>{1, 1, 1}); | ||
|
||
auto slice = | ||
std::make_shared<ov::op::v8::Slice>(matched_out_matmul.get_node()->input_value(0), start, stop, step); | ||
|
||
matched_out_matmul.get_node()->input(0).replace_source_output(slice); | ||
|
||
return true; // root was changed | ||
} | ||
return false; // root hasn't changed | ||
}; | ||
register_matcher(std::make_shared<opp::Matcher>(res, "SliceLastMatmulMultiply"), std::move(callback)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concern here. Not sure which topologies does it serve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, keeping for now
rewr.add_matcher<ov::npuw::patterns::opt::SliceLastMatmulAdd>(); | ||
rewr.add_matcher<ov::npuw::patterns::opt::SliceLastMatmulTranspose>(); | ||
rewr.add_matcher<ov::npuw::patterns::opt::SliceLastMatmulMultiply>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd drop these three
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, keeping for now
@@ -147,6 +147,14 @@ ov::npuw::CompiledModel::CompiledModel(const std::shared_ptr<ov::Model>& model, | |||
rewr.run_on_model(model); | |||
} | |||
|
|||
// Add Slice before last MatMul for the prefill model | |||
ov::pass::GraphRewrite rewr; | |||
rewr.add_matcher<ov::npuw::patterns::opt::SliceLastMatmul>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this under a config too. We're breaking the model's API convention in fact, user gave us a model with [N,K]
output, but the compiled model he gets instead would be [1,K]
.
Let it be NPUW_SLICE_OUT
, let it be NO by default, let @TolyaTalamanov and our internal configs pass it by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
…into as/npuw_slice_last_matmul
433e44e
Based on openvinotoolkit/openvino.genai#814