Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added possibility to pass PREFILL/GENERATE configs and pad_token_id #28154

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

AsyaPronina
Copy link
Contributor

@AsyaPronina AsyaPronina commented Dec 20, 2024

Details:

  • Added parsing of passed NPUW_LLM_PREFILL_CONFIG and NPUW_LLM_GENERATE_CONFIG options
  • Added parsing of passed NPUW_LLM_PAD_TOKEN_ID

Tickets:

  • EISW-149349
  • EISW-149350

Related PRs:

@AsyaPronina AsyaPronina requested review from a team as code owners December 20, 2024 01:55
@github-actions github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Dec 20, 2024
@AsyaPronina AsyaPronina force-pushed the npuw_llm_model_configs branch from e38b474 to 7d88863 Compare December 20, 2024 02:10
@dmatveev
Copy link
Contributor

@TolyaTalamanov please review

@AsyaPronina AsyaPronina force-pushed the npuw_llm_model_configs branch from 7d88863 to b52da47 Compare December 23, 2024 18:12
merge_config_with(prefill_config, properties_copy);
merge_config_with(generate_config, properties_copy);
// FIXME: Drop CACHE_DIR option if NPUW is enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it dropped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it should be handled on the GenAI side, here we already passed through the NPU plugin, that chooses us (npuw::LLMCompiledModel) and checked CACHE_DIR existance

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this config, will NPU plugin handle CACHE_DIR? Or it will be responsibility of NPUW?

USE_NPUW: YES,
NPUW_LLM_PIPELINE: YES,
CACHE_DIR: "..."

* Tell NPUW the configuration for compilation of prefill model.
* NOTE: !! Write-only !!
*/
static constexpr ov::Property<std::string> prefill_config{"NPUW_LLM_PREFILL_CONFIG"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering why do we even need the Property for this?

They idea is that user may provide it like this:

model = read_model(...);
auto compiled = core.compile_model(model, "NPU", { "NPUW_LLM_PREFILL_CONFIG": {...} });

Note, there is no need for user to set or get this config later on. It just should be passed once

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just for us that all things are in one place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus these are also properties, they shouldn't be handled another way, because it will seem as hack. We need unified place to show all properties we have and unified way of handling them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just for us that all things are in one place

TBH, didn't get the point. What are the things and why there should be in one place?

My point is that having llm config params (e.g NPUW_LLM_PREFILL_CONFIG, ...) as properties complicates implementation as it brings more responsibilities to properly handle them. When it's just ov::AnyMap, it's parsed once in llm_compiled_model.cpp and then forgotten.

@@ -308,6 +311,11 @@ void ov::npuw::LLMCompiledModel::set_property(const ov::AnyMap& properties) {

ov::Any ov::npuw::LLMCompiledModel::get_property(const std::string& name) const {
OPENVINO_SUPPRESS_DEPRECATED_START
if (name == ov::intel_npu::npuw::llm::prefill_config.name() ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe it's really needed, see comment above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_property() might be not needed at all here, so as it is a redudant functionality, I suppose to at least handle everything in a unified way here to not create a mess.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keys provided to LLM pipeline must not be properties, so there won't be any mess

@dmatveev
Copy link
Contributor

@TolyaTalamanov have you finished with review, should this be merged?

@AsyaPronina there are merge conflicts

* Tell NPUW the configuration for compilation of prefill model.
* NOTE: !! Write-only !!
*/
static constexpr ov::Property<ov::AnyMap> prefill_config{"NPUW_LLM_PREFILL_CONFIG"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NPUW_LLM_PREFILL_CONFIG and NPUW_LLM_GENERATE_CONFIG are supposed to be passed to compile(...) once and then can be forgotten. Why do we need to define properties for that?

@@ -421,6 +429,13 @@ static constexpr ov::Property<uint32_t> min_response_len{"NPUW_LLM_MIN_RESPONSE_
*/
static constexpr ov::Property<std::string> generate_hint{"NPUW_LLM_GENERATE_HINT"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, I'd make it as property

const ::intel_npu::npuw::llm::GenerateHint generate_hint = m_cfg.get<::intel_npu::NPUW_LLM_GENERATE_HINT>();
LOG_DEBUG("9. Passed GENERATE_HINT: " << std::string(::intel_npu::NPUW_LLM_GENERATE_HINT::toString(generate_hint)));
auto generate_config = get_default_generate_config(model, npudesc, generate_hint);
// NB: GENERATE_HINT is only applicable for default generate config!
if (generate_config_opt.has_value() && npuw_llm_props.count(ov::intel_npu::npuw::llm::generate_hint.name())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need npuw_llm_props.count(...) part if generate_hint already extracted a few lines above?

// preserve them somewhere.
auto prefill_config_opt = pop_option(npuw_llm_props, std::string("NPUW_LLM_PREFILL_CONFIG"));
auto generate_config_opt = pop_option(npuw_llm_props, std::string("NPUW_LLM_GENERATE_CONFIG"));

m_cfg.update(any_copy(npuw_llm_props));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe nothing from npuw_llm_props should get into m_cfg, right?

Everything related to LLM pipeline can be extracted here and then forgotten.


auto prefill_config =
prefill_config_opt.value_or(get_default_prefill_config(prefill_model, npudesc)).as<ov::AnyMap>();

const ::intel_npu::npuw::llm::GenerateHint generate_hint = m_cfg.get<::intel_npu::NPUW_LLM_GENERATE_HINT>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd assume it initially extracted from npuw_llm_props

merge_config_with(prefill_config, properties_copy);
merge_config_with(generate_config, properties_copy);
// FIXME: Drop CACHE_DIR option if NPUW is enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this config, will NPU plugin handle CACHE_DIR? Or it will be responsibility of NPUW?

USE_NPUW: YES,
NPUW_LLM_PIPELINE: YES,
CACHE_DIR: "..."

@@ -308,6 +311,11 @@ void ov::npuw::LLMCompiledModel::set_property(const ov::AnyMap& properties) {

ov::Any ov::npuw::LLMCompiledModel::get_property(const std::string& name) const {
OPENVINO_SUPPRESS_DEPRECATED_START
if (name == ov::intel_npu::npuw::llm::prefill_config.name() ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keys provided to LLM pipeline must not be properties, so there won't be any mess

@dmatveev dmatveev added this to the 2025.0 milestone Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants