-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added possibility to pass PREFILL/GENERATE configs and pad_token_id #28154
base: master
Are you sure you want to change the base?
Added possibility to pass PREFILL/GENERATE configs and pad_token_id #28154
Conversation
e38b474
to
7d88863
Compare
@TolyaTalamanov please review |
7d88863
to
b52da47
Compare
src/plugins/intel_npu/src/al/include/intel_npu/npuw_private_properties.hpp
Outdated
Show resolved
Hide resolved
merge_config_with(prefill_config, properties_copy); | ||
merge_config_with(generate_config, properties_copy); | ||
// FIXME: Drop CACHE_DIR option if NPUW is enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it dropped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it should be handled on the GenAI side, here we already passed through the NPU plugin, that chooses us (npuw::LLMCompiledModel
) and checked CACHE_DIR
existance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this config, will NPU
plugin handle CACHE_DIR
? Or it will be responsibility of NPUW
?
USE_NPUW: YES,
NPUW_LLM_PIPELINE: YES,
CACHE_DIR: "..."
* Tell NPUW the configuration for compilation of prefill model. | ||
* NOTE: !! Write-only !! | ||
*/ | ||
static constexpr ov::Property<std::string> prefill_config{"NPUW_LLM_PREFILL_CONFIG"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering why do we even need the Property
for this?
They idea is that user may provide it like this:
model = read_model(...);
auto compiled = core.compile_model(model, "NPU", { "NPUW_LLM_PREFILL_CONFIG": {...} });
Note, there is no need for user to set
or get
this config later on. It just should be passed once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just for us that all things are in one place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus these are also properties, they shouldn't be handled another way, because it will seem as hack. We need unified place to show all properties we have and unified way of handling them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just for us that all things are in one place
TBH, didn't get the point. What are the things
and why there should be in one place?
My point is that having llm config params (e.g NPUW_LLM_PREFILL_CONFIG
, ...) as properties
complicates implementation as it brings more responsibilities to properly handle them. When it's just ov::AnyMap
, it's parsed once in llm_compiled_model.cpp
and then forgotten.
@@ -308,6 +311,11 @@ void ov::npuw::LLMCompiledModel::set_property(const ov::AnyMap& properties) { | |||
|
|||
ov::Any ov::npuw::LLMCompiledModel::get_property(const std::string& name) const { | |||
OPENVINO_SUPPRESS_DEPRECATED_START | |||
if (name == ov::intel_npu::npuw::llm::prefill_config.name() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe it's really needed, see comment above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_property()
might be not needed at all here, so as it is a redudant functionality, I suppose to at least handle everything in a unified way here to not create a mess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keys provided to LLM
pipeline must not be properties
, so there won't be any mess
@TolyaTalamanov have you finished with review, should this be merged? @AsyaPronina there are merge conflicts |
* Tell NPUW the configuration for compilation of prefill model. | ||
* NOTE: !! Write-only !! | ||
*/ | ||
static constexpr ov::Property<ov::AnyMap> prefill_config{"NPUW_LLM_PREFILL_CONFIG"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NPUW_LLM_PREFILL_CONFIG
and NPUW_LLM_GENERATE_CONFIG
are supposed to be passed to compile(...)
once and then can be forgotten. Why do we need to define properties for that?
@@ -421,6 +429,13 @@ static constexpr ov::Property<uint32_t> min_response_len{"NPUW_LLM_MIN_RESPONSE_ | |||
*/ | |||
static constexpr ov::Property<std::string> generate_hint{"NPUW_LLM_GENERATE_HINT"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, I'd make it as property
const ::intel_npu::npuw::llm::GenerateHint generate_hint = m_cfg.get<::intel_npu::NPUW_LLM_GENERATE_HINT>(); | ||
LOG_DEBUG("9. Passed GENERATE_HINT: " << std::string(::intel_npu::NPUW_LLM_GENERATE_HINT::toString(generate_hint))); | ||
auto generate_config = get_default_generate_config(model, npudesc, generate_hint); | ||
// NB: GENERATE_HINT is only applicable for default generate config! | ||
if (generate_config_opt.has_value() && npuw_llm_props.count(ov::intel_npu::npuw::llm::generate_hint.name())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need npuw_llm_props.count(...)
part if generate_hint
already extracted a few lines above?
// preserve them somewhere. | ||
auto prefill_config_opt = pop_option(npuw_llm_props, std::string("NPUW_LLM_PREFILL_CONFIG")); | ||
auto generate_config_opt = pop_option(npuw_llm_props, std::string("NPUW_LLM_GENERATE_CONFIG")); | ||
|
||
m_cfg.update(any_copy(npuw_llm_props)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe nothing from npuw_llm_props
should get into m_cfg
, right?
Everything related to LLM pipeline can be extracted here and then forgotten.
|
||
auto prefill_config = | ||
prefill_config_opt.value_or(get_default_prefill_config(prefill_model, npudesc)).as<ov::AnyMap>(); | ||
|
||
const ::intel_npu::npuw::llm::GenerateHint generate_hint = m_cfg.get<::intel_npu::NPUW_LLM_GENERATE_HINT>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd assume it initially extracted from npuw_llm_props
merge_config_with(prefill_config, properties_copy); | ||
merge_config_with(generate_config, properties_copy); | ||
// FIXME: Drop CACHE_DIR option if NPUW is enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this config, will NPU
plugin handle CACHE_DIR
? Or it will be responsibility of NPUW
?
USE_NPUW: YES,
NPUW_LLM_PIPELINE: YES,
CACHE_DIR: "..."
@@ -308,6 +311,11 @@ void ov::npuw::LLMCompiledModel::set_property(const ov::AnyMap& properties) { | |||
|
|||
ov::Any ov::npuw::LLMCompiledModel::get_property(const std::string& name) const { | |||
OPENVINO_SUPPRESS_DEPRECATED_START | |||
if (name == ov::intel_npu::npuw::llm::prefill_config.name() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keys provided to LLM
pipeline must not be properties
, so there won't be any mess
Details:
NPUW_LLM_PREFILL_CONFIG
andNPUW_LLM_GENERATE_CONFIG
optionsNPUW_LLM_PAD_TOKEN_ID
Tickets:
Related PRs: