Added possibility to pass PREFILL/GENERATE configs and pad_token_id #28154

AsyaPronina · 2024-12-20T01:55:13Z

Details:

Added parsing of passed NPUW_LLM_PREFILL_CONFIG and NPUW_LLM_GENERATE_CONFIG options
Added parsing of passed NPUW_LLM_PAD_TOKEN_ID

Tickets:

EISW-149349
EISW-149350

Related PRs:

OpenVINO GenAI: Static llm pipeline dynamic shape model openvino.genai#1240

dmatveev · 2024-12-23T11:00:47Z

@TolyaTalamanov please review

src/plugins/intel_npu/src/al/include/intel_npu/npuw_private_properties.hpp

TolyaTalamanov · 2024-12-24T14:03:29Z

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

    merge_config_with(prefill_config, properties_copy);
    merge_config_with(generate_config, properties_copy);
-    // FIXME: Drop CACHE_DIR option if NPUW is enabled


Why is it dropped?

Because it should be handled on the GenAI side, here we already passed through the NPU plugin, that chooses us (npuw::LLMCompiledModel) and checked CACHE_DIR existance

Given this config, will NPU plugin handle CACHE_DIR? Or it will be responsibility of NPUW?

USE_NPUW: YES, NPUW_LLM_PIPELINE: YES, CACHE_DIR: "..."

src/plugins/intel_npu/src/al/include/intel_npu/config/npuw.hpp

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

TolyaTalamanov · 2024-12-24T14:22:53Z

src/plugins/intel_npu/src/al/include/intel_npu/npuw_private_properties.hpp

+ * Tell NPUW the configuration for compilation of prefill model.
+ * NOTE: !! Write-only !!
+ */
+static constexpr ov::Property<std::string> prefill_config{"NPUW_LLM_PREFILL_CONFIG"};


Wondering why do we even need the Property for this?

They idea is that user may provide it like this:

model = read_model(...); auto compiled = core.compile_model(model, "NPU", { "NPUW_LLM_PREFILL_CONFIG": {...} });

Note, there is no need for user to set or get this config later on. It just should be passed once

It is just for us that all things are in one place

Plus these are also properties, they shouldn't be handled another way, because it will seem as hack. We need unified place to show all properties we have and unified way of handling them.

It is just for us that all things are in one place

TBH, didn't get the point. What are the things and why there should be in one place?

My point is that having llm config params (e.g NPUW_LLM_PREFILL_CONFIG, ...) as properties complicates implementation as it brings more responsibilities to properly handle them. When it's just ov::AnyMap, it's parsed once in llm_compiled_model.cpp and then forgotten.

TolyaTalamanov · 2024-12-24T14:23:47Z

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

@@ -308,6 +311,11 @@ void ov::npuw::LLMCompiledModel::set_property(const ov::AnyMap& properties) {

 ov::Any ov::npuw::LLMCompiledModel::get_property(const std::string& name) const {
    OPENVINO_SUPPRESS_DEPRECATED_START
+    if (name == ov::intel_npu::npuw::llm::prefill_config.name() ||


I don't believe it's really needed, see comment above

get_property() might be not needed at all here, so as it is a redudant functionality, I suppose to at least handle everything in a unified way here to not create a mess.

Keys provided to LLM pipeline must not be properties, so there won't be any mess

src/plugins/intel_npu/src/plugin/npuw/llm_infer_request.cpp

dmatveev · 2024-12-24T21:33:24Z

@TolyaTalamanov have you finished with review, should this be merged?

@AsyaPronina there are merge conflicts

TolyaTalamanov · 2024-12-27T08:55:19Z

src/plugins/intel_npu/src/al/include/intel_npu/npuw_private_properties.hpp

+ * Tell NPUW the configuration for compilation of prefill model.
+ * NOTE: !! Write-only !!
+ */
+static constexpr ov::Property<ov::AnyMap> prefill_config{"NPUW_LLM_PREFILL_CONFIG"};


NPUW_LLM_PREFILL_CONFIG and NPUW_LLM_GENERATE_CONFIG are supposed to be passed to compile(...) once and then can be forgotten. Why do we need to define properties for that?

TolyaTalamanov · 2024-12-27T08:57:13Z

src/plugins/intel_npu/src/al/include/intel_npu/npuw_private_properties.hpp

@@ -421,6 +429,13 @@ static constexpr ov::Property<uint32_t> min_response_len{"NPUW_LLM_MIN_RESPONSE_
 */
 static constexpr ov::Property<std::string> generate_hint{"NPUW_LLM_GENERATE_HINT"};


Same, I'd make it as property

TolyaTalamanov · 2024-12-27T08:59:49Z

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

    const ::intel_npu::npuw::llm::GenerateHint generate_hint = m_cfg.get<::intel_npu::NPUW_LLM_GENERATE_HINT>();
    LOG_DEBUG("9. Passed GENERATE_HINT: " << std::string(::intel_npu::NPUW_LLM_GENERATE_HINT::toString(generate_hint)));
-    auto generate_config = get_default_generate_config(model, npudesc, generate_hint);
+    // NB: GENERATE_HINT is only applicable for default generate config!
+    if (generate_config_opt.has_value() && npuw_llm_props.count(ov::intel_npu::npuw::llm::generate_hint.name())) {


Do we need npuw_llm_props.count(...) part if generate_hint already extracted a few lines above?

TolyaTalamanov · 2024-12-27T09:01:19Z

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

+    // preserve them somewhere.
+    auto prefill_config_opt = pop_option(npuw_llm_props, std::string("NPUW_LLM_PREFILL_CONFIG"));
+    auto generate_config_opt = pop_option(npuw_llm_props, std::string("NPUW_LLM_GENERATE_CONFIG"));
+
    m_cfg.update(any_copy(npuw_llm_props));


I believe nothing from npuw_llm_props should get into m_cfg, right?

Everything related to LLM pipeline can be extracted here and then forgotten.

TolyaTalamanov · 2024-12-27T09:02:03Z

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

+
+    auto prefill_config =
+        prefill_config_opt.value_or(get_default_prefill_config(prefill_model, npudesc)).as<ov::AnyMap>();
+
    const ::intel_npu::npuw::llm::GenerateHint generate_hint = m_cfg.get<::intel_npu::NPUW_LLM_GENERATE_HINT>();


I'd assume it initially extracted from npuw_llm_props

TolyaTalamanov · 2024-12-27T09:04:08Z

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

    merge_config_with(prefill_config, properties_copy);
    merge_config_with(generate_config, properties_copy);
-    // FIXME: Drop CACHE_DIR option if NPUW is enabled


Given this config, will NPU plugin handle CACHE_DIR? Or it will be responsibility of NPUW?

USE_NPUW: YES, NPUW_LLM_PIPELINE: YES, CACHE_DIR: "..."

TolyaTalamanov · 2024-12-27T09:05:20Z

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

@@ -308,6 +311,11 @@ void ov::npuw::LLMCompiledModel::set_property(const ov::AnyMap& properties) {

 ov::Any ov::npuw::LLMCompiledModel::get_property(const std::string& name) const {
    OPENVINO_SUPPRESS_DEPRECATED_START
+    if (name == ov::intel_npu::npuw::llm::prefill_config.name() ||


Keys provided to LLM pipeline must not be properties, so there won't be any mess

src/plugins/intel_npu/src/plugin/npuw/llm_infer_request.cpp

AsyaPronina requested review from a team as code owners December 20, 2024 01:55

github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Dec 20, 2024

AsyaPronina force-pushed the npuw_llm_model_configs branch from e38b474 to 7d88863 Compare December 20, 2024 02:10

dmatveev assigned TolyaTalamanov Dec 23, 2024

AsyaPronina added 2 commits December 23, 2024 17:54

Added possibility to pass PREFILL/GENERATE configs and pad_token_id

26a077c

Fixed clang-format

b52da47

AsyaPronina force-pushed the npuw_llm_model_configs branch from 7d88863 to b52da47 Compare December 23, 2024 18:12

TolyaTalamanov reviewed Dec 24, 2024

View reviewed changes

Fixed according review comments

a263f2c

AsyaPronina mentioned this pull request Dec 24, 2024

Static llm pipeline dynamic shape model openvinotoolkit/openvino.genai#1240

Open

TolyaTalamanov reviewed Dec 27, 2024

View reviewed changes

dmatveev added this to the 2025.0 milestone Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added possibility to pass PREFILL/GENERATE configs and pad_token_id #28154

Added possibility to pass PREFILL/GENERATE configs and pad_token_id #28154

AsyaPronina commented Dec 20, 2024 •

edited

Loading

dmatveev commented Dec 23, 2024

TolyaTalamanov Dec 24, 2024

AsyaPronina Dec 24, 2024

TolyaTalamanov Dec 27, 2024

TolyaTalamanov Dec 24, 2024

AsyaPronina Dec 24, 2024

AsyaPronina Dec 24, 2024

TolyaTalamanov Dec 27, 2024

TolyaTalamanov Dec 24, 2024

AsyaPronina Dec 24, 2024

TolyaTalamanov Dec 27, 2024

dmatveev commented Dec 24, 2024

TolyaTalamanov Dec 27, 2024

TolyaTalamanov Dec 27, 2024

TolyaTalamanov Dec 27, 2024

TolyaTalamanov Dec 27, 2024

TolyaTalamanov Dec 27, 2024

TolyaTalamanov Dec 27, 2024

TolyaTalamanov Dec 27, 2024

		@@ -421,6 +429,13 @@ static constexpr ov::Property<uint32_t> min_response_len{"NPUW_LLM_MIN_RESPONSE_
		*/
		static constexpr ov::Property<std::string> generate_hint{"NPUW_LLM_GENERATE_HINT"};

Added possibility to pass PREFILL/GENERATE configs and pad_token_id #28154

Are you sure you want to change the base?

Added possibility to pass PREFILL/GENERATE configs and pad_token_id #28154

Conversation

AsyaPronina commented Dec 20, 2024 • edited Loading

Details:

Tickets:

Related PRs:

dmatveev commented Dec 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmatveev commented Dec 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AsyaPronina commented Dec 20, 2024 •

edited

Loading