Continuous Batching in VLM [Draft] #1704

popovaan · 2025-02-10T14:38:37Z

Ticket: 159639

src/cpp/src/block_manager.hpp

src/cpp/src/model_runner.hpp

src/cpp/src/sequence_group.hpp

src/cpp/src/model_runner.hpp

src/cpp/src/visual_language/pipeline.cpp

Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>

src/cpp/src/continuous_batching_impl.cpp

src/cpp/src/block_manager.hpp

src/cpp/src/visual_language/continuous_batching_adapter.hpp

src/cpp/src/continuous_batching_impl.hpp

src/cpp/src/sequence_group.hpp

src/cpp/src/continuous_batching_impl.cpp

src/cpp/src/model_runner.hpp

src/cpp/include/openvino/genai/continuous_batching_pipeline.hpp

src/cpp/src/icontinuous_batching.cpp

ilya-lavrenov · 2025-02-21T08:07:16Z

src/cpp/src/icontinuous_batching.cpp

@@ -146,4 +201,14 @@ void ContinuousBatchingPipeline::IContinuousBatchingPipeline::stream_tokens(
    const auto tokens = generation_outputs.begin()->second.generated_ids;
    streamer_ptr->write(tokens);
 }
+
+GenerationHandle ContinuousBatchingPipeline::IContinuousBatchingPipeline::add_request(uint64_t request_id,


probably, we need to move implementation of this method to ContinuousBatchingImpl ? as for other add_request methods

Moved to ContinuousBatchingImpl, but in this case this method also needs to be implemented in SpeculativeDecodingImpl and PromptLookupImpl.

src/cpp/src/continuous_batching_impl.cpp

ilya-lavrenov · 2025-02-21T08:15:45Z

src/cpp/src/continuous_batching_pipeline.cpp

@@ -113,28 +166,37 @@ ContinuousBatchingPipeline::ContinuousBatchingPipeline(
    auto draft_model_desr = extract_draft_model_from_config(properties_without_draft_model);
    auto is_prompt_lookup_enabled = extract_prompt_lookup_from_config(properties_without_draft_model);
    auto model = utils::singleton_core().read_model(model_str, weights_tensor);
+    auto directory = std::filesystem::path(get_directory(model_str));


model_str is a content of XML file. So, we cannot extract directory here.

Since 2025.0 release, IR frontend inserts __weights_path as runtime info for ov::Model (see openvinotoolkit/openvino#29101), so I think we can try to check this information and restore directory where model was read from

src/cpp/src/continuous_batching_pipeline.cpp

src/cpp/src/model_runner.hpp

ilya-lavrenov · 2025-02-21T08:27:45Z

src/cpp/src/model_runner.hpp

@@ -75,6 +79,11 @@ class ModelRunner {
        return m_request;
    }

+    void set_inputs_embedder(std::shared_ptr<InputsEmbedder> embedder) {
+        m_use_embeddings = true;


looks like m_use_embeddings is excessive as we can always check if (m_inputs_embedder)

tests/python_tests/test_vlm_pipeline.py

ilya-lavrenov · 2025-02-21T08:37:07Z

src/cpp/src/utils.cpp

+    OPENVINO_THROW("Model with key '", key, "' not found in models map.");
+}
+
+std::pair<ov::AnyMap, SchedulerConfig> extract_scheduler_config(const ov::AnyMap& properties, std::optional<SchedulerConfig> default_config) {


we have the same copy in llm_pipeline.cpp
probably, we need to drop it from there

ilya-lavrenov · 2025-02-21T08:38:09Z

src/cpp/src/visual_language/continuous_batching_adapter.hpp

+        const ov::AnyMap& properties,
+        const ov::genai::GenerationConfig& generation_config
+    ): m_impl{
+        "./", 


looks like in future we have to replicate the same ctor in CB pipeline..

Let's keep as TODO right now.

github-actions bot added category: visual language Visual language pipeline category: continuous batching Continuous batching category: sampling Sampling / Decoding algorithms category: GenAI C++ API Changes in GenAI C++ public headers labels Feb 10, 2025

popovaan added 2 commits February 10, 2025 15:40

CB in VLM draft.

1be6eb8

Removed not needed changes.

9648a80

ilya-lavrenov self-assigned this Feb 10, 2025

ilya-lavrenov reviewed Feb 11, 2025

View reviewed changes

src/cpp/src/block_manager.hpp Outdated Show resolved Hide resolved

src/cpp/src/model_runner.hpp Outdated Show resolved Hide resolved

src/cpp/src/model_runner.hpp Outdated Show resolved Hide resolved

ilya-lavrenov reviewed Feb 11, 2025

View reviewed changes

src/cpp/src/sequence_group.hpp Outdated Show resolved Hide resolved

src/cpp/src/model_runner.hpp Outdated Show resolved Hide resolved

src/cpp/src/visual_language/pipeline.cpp Show resolved Hide resolved

src/cpp/src/visual_language/pipeline.cpp Outdated Show resolved Hide resolved

popovaan mentioned this pull request Feb 11, 2025

[TRANSFORMATIONS] Set proper of VLM's "inputs_embeds" input openvinotoolkit/openvino#28928

Merged

popovaan added 2 commits February 13, 2025 12:25

Added embeddings infer, added VLM generate to CB.

3d230c8

Merge master.

a1795fa

github-actions bot added category: speculative decoding Speculative decoding no-match-files category: prompt lookup and removed category: sampling Sampling / Decoding algorithms labels Feb 13, 2025

popovaan and others added 5 commits February 13, 2025 14:53

Fixed error.

b2c5a55

Update src/cpp/src/block_manager.hpp

607ab20

Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>

Code corrections.

195175a

Removed not needed fields from VLM base class.

d56dd87

Minor correction.

8e79612

ilya-lavrenov reviewed Feb 14, 2025

View reviewed changes

src/cpp/src/model_runner.hpp Show resolved Hide resolved

src/cpp/src/model_runner.hpp Outdated Show resolved Hide resolved

popovaan added 2 commits February 14, 2025 17:35

Fixed errors.

fd9fa0e

Merge remote-tracking branch 'upstream/master' into vlm_cb

f390731

github-actions bot removed category: speculative decoding Speculative decoding category: prompt lookup labels Feb 17, 2025

popovaan added 3 commits February 17, 2025 11:31

Applied comments.

a9ada7c

Removed not needed code.

d1d4467

Added images support to add_request, minor corrections.

80b9399

Minor corrections.

272e5bf

github-actions bot added the category: Python API Python API for GenAI label Feb 19, 2025

popovaan requested a review from ilya-lavrenov February 19, 2025 10:55

popovaan marked this pull request as ready for review February 19, 2025 10:55

popovaan added 2 commits February 19, 2025 12:06

Python bindings.

279e6c7

Added test.

0ef126c

mzegla reviewed Feb 20, 2025

View reviewed changes

src/cpp/include/openvino/genai/continuous_batching_pipeline.hpp Outdated Show resolved Hide resolved

Merge upstream/master.

3c3c5dd

ilya-lavrenov added this to the 2025.1 milestone Feb 20, 2025

Update pyi, minor corrections.

41312e6

popovaan requested a review from mzegla February 20, 2025 17:30

Update pyi.

5a9315f

ilya-lavrenov reviewed Feb 21, 2025

View reviewed changes

popovaan added 3 commits February 21, 2025 10:27

Merge upstream/master.

1f04ef8

Added tests.

e856938

Added default llm properties to test.

a40573d

github-actions bot added category: speculative decoding Speculative decoding category: prompt lookup labels Feb 21, 2025

popovaan added 2 commits February 21, 2025 17:44

Applied comments.

0286311

Minor correction.

b5d9cbc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continuous Batching in VLM [Draft] #1704

Continuous Batching in VLM [Draft] #1704

popovaan commented Feb 10, 2025

ilya-lavrenov Feb 21, 2025

popovaan Feb 21, 2025

ilya-lavrenov Feb 21, 2025

ilya-lavrenov Feb 21, 2025

ilya-lavrenov Feb 21, 2025

ilya-lavrenov Feb 21, 2025

Continuous Batching in VLM [Draft] #1704

Are you sure you want to change the base?

Continuous Batching in VLM [Draft] #1704

Conversation

popovaan commented Feb 10, 2025

ilya-lavrenov Feb 21, 2025

Choose a reason for hiding this comment

popovaan Feb 21, 2025

Choose a reason for hiding this comment

ilya-lavrenov Feb 21, 2025

Choose a reason for hiding this comment

ilya-lavrenov Feb 21, 2025

Choose a reason for hiding this comment

ilya-lavrenov Feb 21, 2025

Choose a reason for hiding this comment

ilya-lavrenov Feb 21, 2025

Choose a reason for hiding this comment