Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous Batching in VLM [Draft] #1704

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

popovaan
Copy link
Contributor

Ticket: 159639

@github-actions github-actions bot added category: visual language Visual language pipeline category: continuous batching Continuous batching category: sampling Sampling / Decoding algorithms category: GenAI C++ API Changes in GenAI C++ public headers labels Feb 10, 2025
@ilya-lavrenov ilya-lavrenov self-assigned this Feb 10, 2025
@github-actions github-actions bot added category: speculative decoding Speculative decoding no-match-files category: prompt lookup and removed category: sampling Sampling / Decoding algorithms labels Feb 13, 2025
@github-actions github-actions bot added the category: Python API Python API for GenAI label Feb 19, 2025
@popovaan popovaan marked this pull request as ready for review February 19, 2025 10:55
@ilya-lavrenov ilya-lavrenov added this to the 2025.1 milestone Feb 20, 2025
@popovaan popovaan requested a review from mzegla February 20, 2025 17:30
@@ -146,4 +201,14 @@ void ContinuousBatchingPipeline::IContinuousBatchingPipeline::stream_tokens(
const auto tokens = generation_outputs.begin()->second.generated_ids;
streamer_ptr->write(tokens);
}

GenerationHandle ContinuousBatchingPipeline::IContinuousBatchingPipeline::add_request(uint64_t request_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably, we need to move implementation of this method to ContinuousBatchingImpl ? as for other add_request methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to ContinuousBatchingImpl, but in this case this method also needs to be implemented in SpeculativeDecodingImpl and PromptLookupImpl.

@@ -113,28 +166,37 @@ ContinuousBatchingPipeline::ContinuousBatchingPipeline(
auto draft_model_desr = extract_draft_model_from_config(properties_without_draft_model);
auto is_prompt_lookup_enabled = extract_prompt_lookup_from_config(properties_without_draft_model);
auto model = utils::singleton_core().read_model(model_str, weights_tensor);
auto directory = std::filesystem::path(get_directory(model_str));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model_str is a content of XML file. So, we cannot extract directory here.

Since 2025.0 release, IR frontend inserts __weights_path as runtime info for ov::Model (see openvinotoolkit/openvino#29101), so I think we can try to check this information and restore directory where model was read from

@@ -75,6 +79,11 @@ class ModelRunner {
return m_request;
}

void set_inputs_embedder(std::shared_ptr<InputsEmbedder> embedder) {
m_use_embeddings = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like m_use_embeddings is excessive as we can always check if (m_inputs_embedder)

OPENVINO_THROW("Model with key '", key, "' not found in models map.");
}

std::pair<ov::AnyMap, SchedulerConfig> extract_scheduler_config(const ov::AnyMap& properties, std::optional<SchedulerConfig> default_config) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have the same copy in llm_pipeline.cpp
probably, we need to drop it from there

const ov::AnyMap& properties,
const ov::genai::GenerationConfig& generation_config
): m_impl{
"./",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like in future we have to replicate the same ctor in CB pipeline..

Let's keep as TODO right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants