-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continuous Batching in VLM [Draft] #1704
base: master
Are you sure you want to change the base?
Conversation
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
src/cpp/include/openvino/genai/continuous_batching_pipeline.hpp
Outdated
Show resolved
Hide resolved
src/cpp/src/icontinuous_batching.cpp
Outdated
@@ -146,4 +201,14 @@ void ContinuousBatchingPipeline::IContinuousBatchingPipeline::stream_tokens( | |||
const auto tokens = generation_outputs.begin()->second.generated_ids; | |||
streamer_ptr->write(tokens); | |||
} | |||
|
|||
GenerationHandle ContinuousBatchingPipeline::IContinuousBatchingPipeline::add_request(uint64_t request_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably, we need to move implementation of this method to ContinuousBatchingImpl
? as for other add_request
methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to ContinuousBatchingImpl
, but in this case this method also needs to be implemented in SpeculativeDecodingImpl
and PromptLookupImpl
.
@@ -113,28 +166,37 @@ ContinuousBatchingPipeline::ContinuousBatchingPipeline( | |||
auto draft_model_desr = extract_draft_model_from_config(properties_without_draft_model); | |||
auto is_prompt_lookup_enabled = extract_prompt_lookup_from_config(properties_without_draft_model); | |||
auto model = utils::singleton_core().read_model(model_str, weights_tensor); | |||
auto directory = std::filesystem::path(get_directory(model_str)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model_str
is a content of XML file. So, we cannot extract directory
here.
Since 2025.0 release, IR frontend inserts __weights_path
as runtime info for ov::Model (see openvinotoolkit/openvino#29101), so I think we can try to check this information and restore directory where model was read from
@@ -75,6 +79,11 @@ class ModelRunner { | |||
return m_request; | |||
} | |||
|
|||
void set_inputs_embedder(std::shared_ptr<InputsEmbedder> embedder) { | |||
m_use_embeddings = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like m_use_embeddings
is excessive as we can always check if (m_inputs_embedder)
OPENVINO_THROW("Model with key '", key, "' not found in models map."); | ||
} | ||
|
||
std::pair<ov::AnyMap, SchedulerConfig> extract_scheduler_config(const ov::AnyMap& properties, std::optional<SchedulerConfig> default_config) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have the same copy in llm_pipeline.cpp
probably, we need to drop it from there
const ov::AnyMap& properties, | ||
const ov::genai::GenerationConfig& generation_config | ||
): m_impl{ | ||
"./", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like in future we have to replicate the same ctor in CB pipeline..
Let's keep as TODO right now.
Ticket: 159639