Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NPUW] Add Slice before last MatMul #27229

Merged

Conversation

smirnov-alexey
Copy link
Contributor


SliceLastMatmulAdd::SliceLastMatmulAdd() {
auto matmul = opp::wrap_type<ov::op::v0::MatMul>({opp::any_input(), opp::any_input()});
auto add = opp::wrap_type<ov::op::v1::Add>({matmul, opp::any_input()});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if all those additional patterns could be simplified with optional nodes

Copy link
Contributor

@dmatveev dmatveev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work! @slyalin can you please comment on my concerns?

Comment on lines +1325 to +1326
auto add = opp::wrap_type<ov::op::v1::Add>({matmul, opp::any_input()});
auto res = opp::wrap_type<ov::op::v0::Result>({add});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I've seen cases like this. @TolyaTalamanov did you?

@smirnov-alexey what pattern worked for you most of the time?

I think this pattern we can drop

Copy link
Contributor

@slyalin slyalin Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summoning @olpipi who showed that this and later mentioned patterns were required. @olpipi could you give a list of topologies?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only saw the first one, but as discussed let's keep all 3 for now (enabled via property)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the next patterns and applied them in genai repo:

chatglm2-6b
chatglm3-6b
Matmul -> Transpose -> Result

codegen2-1b
codegen2-3_7b
codegen2-7b
gpt-j-6b
phi-2
Matmul -> Add -> Result

gemma-2-2b
gemma-2-9b
MatMul -> Divide -> Tanh -> Multiply -> Result

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @olpipi, appreciate your help

Comment on lines +1359 to +1362
SliceLastMatmulTranspose::SliceLastMatmulTranspose() {
auto matmul = opp::wrap_type<ov::op::v0::MatMul>({opp::any_input(), opp::any_input()});
auto add = opp::wrap_type<ov::op::v1::Transpose>({matmul, opp::any_input()});
auto res = opp::wrap_type<ov::op::v0::Result>({matmul});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not sure about this one. My main concern is that we can alter more matmuls than we actually need..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, keeping for now

Comment on lines +1395 to +1431
SliceLastMatmulMultiply::SliceLastMatmulMultiply() {
auto matmul = opp::wrap_type<ov::op::v0::MatMul>({opp::any_input(), opp::any_input()});
auto div = opp::wrap_type<ov::op::v1::Divide>({matmul, opp::any_input()});
auto tanh = opp::wrap_type<ov::op::v0::Tanh>({div});
auto multiply = opp::wrap_type<ov::op::v1::Multiply>({tanh, opp::any_input()});
auto res = opp::wrap_type<ov::op::v0::Result>({multiply});

// Note: Use [=] to make sure the above objects stay alive in the callback
auto callback = [=](ov::pass::pattern::Matcher& m) {
auto& node_to_output = m.get_pattern_value_map();

auto matched_out_matmul = node_to_output.at(matmul);

auto shape = matched_out_matmul.get_node()->input(0).get_shape();

if (shape.size() == 3 && shape[1] > 1) {
auto start = std::make_shared<ov::op::v0::Constant>(ov::element::i32,
ov::Shape{3},
std::vector<int32_t>{0, int32_t(shape[1] - 1), 0});
auto stop =
std::make_shared<ov::op::v0::Constant>(ov::element::i32,
ov::Shape{3},
std::vector<int32_t>{1, int32_t(shape[1]), int32_t(shape[2])});
auto step =
std::make_shared<ov::op::v0::Constant>(ov::element::i32, ov::Shape{3}, std::vector<int32_t>{1, 1, 1});

auto slice =
std::make_shared<ov::op::v8::Slice>(matched_out_matmul.get_node()->input_value(0), start, stop, step);

matched_out_matmul.get_node()->input(0).replace_source_output(slice);

return true; // root was changed
}
return false; // root hasn't changed
};
register_matcher(std::make_shared<opp::Matcher>(res, "SliceLastMatmulMultiply"), std::move(callback));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern here. Not sure which topologies does it serve.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, keeping for now

Comment on lines 153 to 155
rewr.add_matcher<ov::npuw::patterns::opt::SliceLastMatmulAdd>();
rewr.add_matcher<ov::npuw::patterns::opt::SliceLastMatmulTranspose>();
rewr.add_matcher<ov::npuw::patterns::opt::SliceLastMatmulMultiply>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd drop these three

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, keeping for now

@@ -147,6 +147,14 @@ ov::npuw::CompiledModel::CompiledModel(const std::shared_ptr<ov::Model>& model,
rewr.run_on_model(model);
}

// Add Slice before last MatMul for the prefill model
ov::pass::GraphRewrite rewr;
rewr.add_matcher<ov::npuw::patterns::opt::SliceLastMatmul>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this under a config too. We're breaking the model's API convention in fact, user gave us a model with [N,K] output, but the compiled model he gets instead would be [1,K].

Let it be NPUW_SLICE_OUT, let it be NO by default, let @TolyaTalamanov and our internal configs pass it by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@dmatveev dmatveev added this pull request to the merge queue Oct 24, 2024
Merged via the queue into openvinotoolkit:master with commit 433e44e Oct 24, 2024
132 of 134 checks passed
@dmatveev dmatveev deleted the as/npuw_slice_last_matmul branch October 24, 2024 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin Code Freeze
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants