Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Support image processor #4197

Merged
merged 130 commits into from
Jun 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
130 commits
Select commit Hold shift + click to select a range
a26badd
Support image processor
DarkLight1337 Apr 19, 2024
1a0ecca
Convert dtype in multi modal processing
DarkLight1337 Apr 19, 2024
45b6756
Move `MultiModalData` to new subpackage `multimodal`
DarkLight1337 Apr 22, 2024
6ed8397
Add multi-modal processor registry
DarkLight1337 Apr 22, 2024
8c48208
Initialize the processor only once
DarkLight1337 Apr 22, 2024
613ec1b
Merge branch 'upstream' into mm-data-processor
DarkLight1337 Apr 22, 2024
c48a7d4
Move processor to model runner
DarkLight1337 Apr 22, 2024
3232231
Refactor registry to plugin pattern in order to support specifying du…
DarkLight1337 Apr 23, 2024
92a0283
Merge branch 'upstream' into mm-data-processor
DarkLight1337 Apr 23, 2024
5d42800
Combine prompt inputs
DarkLight1337 Apr 24, 2024
5db2c5e
Fix a bunch of tests
DarkLight1337 Apr 25, 2024
74c5905
Fix LLaVA test
DarkLight1337 Apr 25, 2024
cd8917b
Merge branch 'upstream' into llm-inputs
DarkLight1337 Apr 25, 2024
b49aba7
Fix `benchmark_latency` test
DarkLight1337 Apr 25, 2024
bfd7295
Merge branch 'upstream' into llm-inputs
DarkLight1337 Apr 25, 2024
45c7f23
Merge branch 'upstream' into llm-inputs
DarkLight1337 Apr 27, 2024
493e6ed
Merge branch 'upstream' into llm-inputs
DarkLight1337 Apr 28, 2024
df1b20b
Merge branch 'upstream' into mm-data-processor
DarkLight1337 Apr 28, 2024
20aeceb
Merge branch 'upstream' into llm-inputs
DarkLight1337 May 1, 2024
0f46653
Merge branch 'upstream' into llm-inputs
DarkLight1337 May 3, 2024
c4f3540
Clarify tokenizer usage
DarkLight1337 May 3, 2024
ab8182c
Rename `encode_request -> process_model_inputs`
DarkLight1337 May 3, 2024
eac33e1
Support old API in `LLM.generate`
DarkLight1337 May 3, 2024
0ff8189
Merge branch 'upstream' into mm-data-processor
DarkLight1337 May 3, 2024
9663b50
Fix import error
DarkLight1337 May 3, 2024
703d318
Add tests to ensure old API still works
DarkLight1337 May 3, 2024
19d85f9
Let all entrypoints tests be run at the same time
DarkLight1337 May 3, 2024
0cf2dbe
Merge branch 'upstream' into mm-data-processor
DarkLight1337 May 4, 2024
554e8c5
Apply formatter
DarkLight1337 May 4, 2024
0921bad
Merge branch 'upstream' into mm-data-processor
DarkLight1337 May 7, 2024
baebd99
Merge branch 'upstream' into llm-inputs
DarkLight1337 May 7, 2024
2cc5498
Merge branch 'upstream' into mm-data-processor
DarkLight1337 May 8, 2024
dc9816f
Merge branch 'upstream' into llm-inputs
DarkLight1337 May 8, 2024
1c50600
Merge branch 'upstream' into llm-inputs
DarkLight1337 May 14, 2024
5759dfa
Add tests for LLM.encode and fix corresponding bugs
DarkLight1337 May 14, 2024
cc4bfb5
Apply formatter
DarkLight1337 May 14, 2024
6085b08
Merge branch 'upstream' into llm-inputs
DarkLight1337 May 14, 2024
d5c9731
Rename `_add_requests` to `_validate_and_add_requests` to be more sim…
DarkLight1337 May 14, 2024
4f218a5
Separate `entrypoints` tests into two groups
DarkLight1337 May 14, 2024
428df48
Merge branch 'upstream' into mm-data-processor
DarkLight1337 May 14, 2024
f153450
Remove duplicate comment
DarkLight1337 May 14, 2024
a9201d0
Fix memory profiling error
DarkLight1337 May 14, 2024
ceebfa6
Fix memory usage for embedding server
DarkLight1337 May 15, 2024
7d991cd
Update embeddings API to use new imputs
DarkLight1337 May 15, 2024
0e79dfb
Merge branch 'upstream' into llm-inputs
DarkLight1337 May 15, 2024
b867b5e
Merge branch 'upstream' into mm-data-processor
DarkLight1337 May 15, 2024
2c0d58f
Merge branch 'upstream' into llm-inputs
DarkLight1337 May 15, 2024
26f7253
Merge branch 'upstream' into mm-data-processor
DarkLight1337 May 15, 2024
d553693
Apply formatter
DarkLight1337 May 15, 2024
48e7a4a
Merge branch 'upstream' into llm-inputs
DarkLight1337 May 16, 2024
595654c
Merge branch 'upstream' into mm-data-processor
DarkLight1337 May 20, 2024
b6c0e29
Merge branch 'upstream' into llm-inputs
DarkLight1337 May 20, 2024
e055472
Avoid duplicate `Tensor.to` calls
DarkLight1337 May 20, 2024
3097582
Merge `llm` groups back into one by enabling gc
DarkLight1337 May 20, 2024
9fe9bed
Add test for image pixel processor
DarkLight1337 May 20, 2024
222cb90
Improve CLI args
DarkLight1337 May 20, 2024
33294d5
Rename `multi_modal_datas` parameter
DarkLight1337 May 20, 2024
31cedac
Rename `input_processor` to be more explicit
DarkLight1337 May 20, 2024
21a0218
Rename `multi_modal_data` to be more explicit
DarkLight1337 May 20, 2024
32ae773
Remove patch for LLaVA-NeXT
DarkLight1337 May 20, 2024
78450eb
Apply formatter
DarkLight1337 May 20, 2024
f4defe6
Apply multi-modal refactor to `CPUModelRunner`
DarkLight1337 May 20, 2024
c43173b
Fix multi-modal handling in `EmbeddingModelRunner`
DarkLight1337 May 20, 2024
4c8e64e
Merge branch 'upstream' into mm-data-processor
DarkLight1337 May 20, 2024
ce58b25
Move dummy image data generation to model-agnostic file
DarkLight1337 May 20, 2024
d81f9f1
Add multimodal docs
DarkLight1337 May 20, 2024
7bbd123
Improve documentation for LLM/engine
DarkLight1337 May 20, 2024
056eb61
Direct readers to the `PromptInputs` class
DarkLight1337 May 22, 2024
b3b990a
Separate `_run_engine` from `_validate_and_add_requests`
DarkLight1337 May 22, 2024
2169def
Add flag for deprecating legacy API
DarkLight1337 May 22, 2024
3dbded1
Add tests for `deprecate_kwargs`
DarkLight1337 May 22, 2024
8e20317
Apply formatter
DarkLight1337 May 22, 2024
fdccaa2
Rename attribute to be less misleading
DarkLight1337 May 22, 2024
77ee1c8
Renable using `'fork'` start method and improve speed by using `torch…
DarkLight1337 May 23, 2024
b1bcdd1
Simplify logic of casting request output
DarkLight1337 May 23, 2024
44b4681
Improve code readability
DarkLight1337 May 23, 2024
50343cb
Fix `multi_modal_data` being a required key
DarkLight1337 May 23, 2024
45aa420
Fix index out of range error
DarkLight1337 May 23, 2024
d4e2589
Use a flag to control whether to check output types
DarkLight1337 May 23, 2024
c07b579
Simplify flags
DarkLight1337 May 23, 2024
9d56eb0
Move output validation to a more appropriate location
DarkLight1337 May 23, 2024
bc05031
Add message to deprecation notice
DarkLight1337 May 23, 2024
95d4130
Apply formatter
DarkLight1337 May 23, 2024
cc84f65
Remove unused parameter in `_validate_and_add_requests` and fix test
DarkLight1337 May 24, 2024
6c5d4a6
Simplify code
DarkLight1337 May 25, 2024
fd2da12
Move attribute assignment outside `_init_tokenizer`
DarkLight1337 May 25, 2024
d78de94
Only emit warning once
DarkLight1337 May 25, 2024
8a86829
Simplify assignment expression
DarkLight1337 May 25, 2024
731ac0e
Place special case at the start
DarkLight1337 May 25, 2024
2d1a0bc
move API reference to under developer doc
ywang96 May 25, 2024
7b8ce2c
Fix links in docs
DarkLight1337 May 26, 2024
fff21a1
Remove unnecessary code to avoid repeated warning
DarkLight1337 May 26, 2024
82233ec
Merge branch 'llm-inputs' into mm-data-processor
DarkLight1337 May 27, 2024
797e8a5
Simplify code and fix type annotations
DarkLight1337 May 17, 2024
e10b3fc
Update docs
DarkLight1337 May 27, 2024
c6a9fcf
Use intersphinx and avoid long default values
DarkLight1337 May 27, 2024
a26e1e3
Merge branch 'upstream' into mm-data-processor
DarkLight1337 May 27, 2024
883bea4
Apply formatter
DarkLight1337 May 27, 2024
46bc1ea
Merge branch 'upstream' into mm-data-processor
DarkLight1337 May 29, 2024
d350bb3
Fix bad merge
DarkLight1337 May 29, 2024
2a166a7
Do not support multiple multimodal data in legacy API
DarkLight1337 May 29, 2024
db12c29
Reinstate whitespace
DarkLight1337 May 29, 2024
4a0a85c
Merge branch 'upstream' into mm-data-processor
DarkLight1337 May 29, 2024
6529280
Merge branch 'upstream' into mm-data-processor
DarkLight1337 May 30, 2024
dc6c5fd
Fix bad config dict
DarkLight1337 May 30, 2024
2ed2fdc
Fix tests
DarkLight1337 May 30, 2024
8d09112
Apply formatter
DarkLight1337 May 30, 2024
3fe1f61
Remove `multi_modal_data` support in legacy API
DarkLight1337 May 30, 2024
46af1ac
Add NOTE and TODO
DarkLight1337 May 30, 2024
f620a1b
Add missing type annotations
DarkLight1337 May 30, 2024
70b4165
Rename functions
DarkLight1337 May 30, 2024
87c2da4
Add NOTE
DarkLight1337 May 30, 2024
7fc620c
Fix multimodal inputs being on wrong device
DarkLight1337 May 30, 2024
cd63022
Rename `MM_REGISTRY` to be more explicit
DarkLight1337 May 30, 2024
19fea82
Merge branch 'upstream' into mm-data-processor
DarkLight1337 May 30, 2024
43f2660
fix upstream merge
ywang96 May 30, 2024
5d3a063
Merge branch 'upstream' into mm-data-processor
DarkLight1337 May 31, 2024
b6754a4
Enable passing tensor directly as image
DarkLight1337 May 31, 2024
01b0512
Add pillow to intersphinx and fix quote format
DarkLight1337 May 31, 2024
a996b34
Fix mock imports
DarkLight1337 May 31, 2024
52ed274
Trigger pipeline
DarkLight1337 May 31, 2024
559bd46
Automatically convert dtype
DarkLight1337 May 31, 2024
69c4ff6
Comment out failing test for now
DarkLight1337 May 31, 2024
960e5eb
Fix blank pages in docs
DarkLight1337 May 31, 2024
a3c6fdb
Use the module name, not package name
DarkLight1337 May 31, 2024
d78d456
Trigger pipeline
DarkLight1337 May 31, 2024
243eb90
Trigger pipeline 2
DarkLight1337 May 31, 2024
501b11c
Fix formatting [skip ci]
DarkLight1337 May 31, 2024
3d20f6d
Merge branch 'upstream' into mm-data-processor
DarkLight1337 Jun 3, 2024
680cee9
Merge branch 'upstream' into mm-data-processor
DarkLight1337 Jun 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/mypy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ jobs:
mypy vllm/distributed --config-file pyproject.toml
mypy vllm/entrypoints --config-file pyproject.toml
mypy vllm/executor --config-file pyproject.toml
mypy vllm/multimodal --config-file pyproject.toml
mypy vllm/usage --config-file pyproject.toml
mypy vllm/*.py --config-file pyproject.toml
mypy vllm/transformers_utils --config-file pyproject.toml
Expand Down
14 changes: 8 additions & 6 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ def setup(app):
"sentencepiece",
"vllm.cuda_utils",
"vllm._C",
"PIL",
"numpy",
"tqdm",
"tensorizer",
Expand All @@ -116,12 +117,13 @@ def add_line(self, line: str, source: str, *lineno: int) -> None:
autodoc.ClassDocumenter = MockedClassDocumenter

intersphinx_mapping = {
'python': ('https://docs.python.org/3', None),
'typing_extensions':
('https://typing-extensions.readthedocs.io/en/latest', None),
'numpy': ('https://numpy.org/doc/stable', None),
'torch': ('https://pytorch.org/docs/stable', None),
'psutil': ('https://psutil.readthedocs.io/en/stable', None),
"python": ("https://docs.python.org/3", None),
"typing_extensions":
("https://typing-extensions.readthedocs.io/en/latest", None),
"pillow": ("https://pillow.readthedocs.io/en/stable", None),
"numpy": ("https://numpy.org/doc/stable", None),
"torch": ("https://pytorch.org/docs/stable", None),
"psutil": ("https://psutil.readthedocs.io/en/stable", None),
}

autodoc_preserve_defaults = True
Expand Down
51 changes: 51 additions & 0 deletions docs/source/dev/multimodal/multimodal_index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
Multi-Modality
==============

.. currentmodule:: vllm.multimodal

vLLM provides experimental support for multi-modal models through the :mod:`vllm.multimodal` package.

:class:`vllm.inputs.PromptStrictInputs` accepts an additional attribute ``multi_modal_data``
which allows you to pass in multi-modal input alongside text and token prompts.

By default, vLLM models do not support multi-modal inputs. To enable multi-modal support for a model,
you must decorate the model class with :meth:`MULTIMODAL_REGISTRY.register_dummy_data <MultiModalRegistry.register_dummy_data>`,
as well as :meth:`MULTIMODAL_REGISTRY.register_input <MultiModalRegistry.register_input>` for each modality type to support.

.. contents::
:local:
:backlinks: none

Module Contents
+++++++++++++++

.. automodule:: vllm.multimodal

Registry
--------

.. data:: vllm.multimodal.MULTIMODAL_REGISTRY

The global :class:`MultiModalRegistry` which is used by model runners.

.. autoclass:: vllm.multimodal.MultiModalRegistry
:members:
:show-inheritance:

Base Classes
------------

.. autoclass:: vllm.multimodal.MultiModalData
:members:
:show-inheritance:

.. autoclass:: vllm.multimodal.MultiModalPlugin
:members:
:show-inheritance:

Image Classes
-------------

.. automodule:: vllm.multimodal.image
:members:
:show-inheritance:
6 changes: 4 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ Documentation
models/adding_model
models/engine_args
models/lora
models/vlm
models/performance

.. toctree::
Expand All @@ -99,17 +100,18 @@ Documentation
quantization/fp8_e4m3_kvcache

.. toctree::
:maxdepth: 2
:maxdepth: 1
:caption: Developer Documentation

dev/sampling_params
dev/offline_inference/offline_index
dev/engine/engine_index
dev/kernel/paged_attention
dev/multimodal/multimodal_index
dev/dockerfile/dockerfile

.. toctree::
:maxdepth: 2
:maxdepth: 1
:caption: Community

community/meetups
Expand Down
4 changes: 4 additions & 0 deletions docs/source/models/supported_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,10 @@ Alongside each architecture, we include some popular models that use it.
- LLaMA, Llama 2, Meta Llama 3, Vicuna, Alpaca, Yi
- :code:`meta-llama/Meta-Llama-3-8B-Instruct`, :code:`meta-llama/Meta-Llama-3-70B-Instruct`, :code:`meta-llama/Llama-2-13b-hf`, :code:`meta-llama/Llama-2-70b-hf`, :code:`openlm-research/open_llama_13b`, :code:`lmsys/vicuna-13b-v1.3`, :code:`01-ai/Yi-6B`, :code:`01-ai/Yi-34B`, etc.
- ✅︎
* - :code:`LlavaForConditionalGeneration`
- LLaVA-1.5
- :code:`llava-hf/llava-1.5-7b-hf`\*, :code:`llava-hf/llava-1.5-13b-hf`\*, etc.
-
* - :code:`MiniCPMForCausalLM`
- MiniCPM
- :code:`openbmb/MiniCPM-2B-sft-bf16`, :code:`openbmb/MiniCPM-2B-dpo-bf16`, etc.
Expand Down
56 changes: 56 additions & 0 deletions docs/source/models/vlm.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
.. _vlm:

Using VLMs
==========

This document shows you how to run and serve Vision Language Models (VLMs) using vLLM.

Engine Arguments
----------------

The following :ref:`engine arguments <engine_args>` are specific to VLMs:

.. argparse::
:module: vllm.engine.arg_utils
:func: _vlm_engine_args_parser
:prog: -m vllm.entrypoints.openai.api_server
:nodefaultconst:

Offline Batched Inference
-------------------------

To initialize a VLM, the aforementioned arguments must be passed to the ``LLM`` class for instantiating the engine.

.. code-block:: python

llm = LLM(
model="llava-hf/llava-1.5-7b-hf",
image_input_type="pixel_values",
image_token_id=32000,
image_input_shape="1,3,336,336",
image_feature_size=576,
)

For now, we only support a single image per text prompt. To pass an image to the model, note the following in :class:`vllm.inputs.PromptStrictInputs`:

* ``prompt``: The prompt should have a number of ``<image>`` tokens equal to ``image_feature_size``.
* ``multi_modal_data``: This should be an instance of :class:`~vllm.multimodal.image.ImagePixelData` or :class:`~vllm.multimodal.image.ImageFeatureData`.

.. code-block:: python

prompt = "<image>" * 576 + (
"\nUSER: What is the content of this image?\nASSISTANT:")

# Load the image using PIL.Image
image = ...

outputs = llm.generate({
"prompt": prompt,
"multi_modal_data": ImagePixelData(image),
})

for o in outputs:
generated_text = o.outputs[0].text
print(generated_text)

A code example can be found in `examples/llava_example.py <https://github.com/vllm-project/vllm/blob/main/examples/llava_example.py>`_.
29 changes: 15 additions & 14 deletions examples/llava_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,33 +3,36 @@
import subprocess

import torch
from PIL import Image

from vllm import LLM
from vllm.sequence import MultiModalData
from vllm.multimodal.image import ImageFeatureData, ImagePixelData

# The assets are located at `s3://air-example-data-2/vllm_opensource_llava/`.
# You can use `.buildkite/download-images.sh` to download them


def run_llava_pixel_values():
def run_llava_pixel_values(*, disable_image_processor: bool = False):
llm = LLM(
model="llava-hf/llava-1.5-7b-hf",
image_input_type="pixel_values",
image_token_id=32000,
image_input_shape="1,3,336,336",
image_feature_size=576,
disable_image_processor=disable_image_processor,
)

prompt = "<image>" * 576 + (
"\nUSER: What is the content of this image?\nASSISTANT:")

# This should be provided by another online or offline component.
image = torch.load("images/stop_sign_pixel_values.pt")
if disable_image_processor:
image = torch.load("images/stop_sign_pixel_values.pt")
else:
image = Image.open("images/stop_sign.jpg")

outputs = llm.generate({
"prompt":
prompt,
"multi_modal_data":
MultiModalData(type=MultiModalData.Type.IMAGE, data=image),
"prompt": prompt,
"multi_modal_data": ImagePixelData(image),
})

for o in outputs:
Expand All @@ -49,15 +52,13 @@ def run_llava_image_features():
prompt = "<image>" * 576 + (
"\nUSER: What is the content of this image?\nASSISTANT:")

# This should be provided by another online or offline component.
image = torch.load("images/stop_sign_image_features.pt")
image: torch.Tensor = torch.load("images/stop_sign_image_features.pt")

outputs = llm.generate({
"prompt":
prompt,
"multi_modal_data":
MultiModalData(type=MultiModalData.Type.IMAGE, data=image),
"prompt": prompt,
"multi_modal_data": ImageFeatureData(image),
})

for o in outputs:
generated_text = o.outputs[0].text
print(generated_text)
Expand Down
1 change: 1 addition & 0 deletions format.sh
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ mypy vllm/core --config-file pyproject.toml
mypy vllm/distributed --config-file pyproject.toml
mypy vllm/entrypoints --config-file pyproject.toml
mypy vllm/executor --config-file pyproject.toml
mypy vllm/multimodal --config-file pyproject.toml
mypy vllm/usage --config-file pyproject.toml
mypy vllm/*.py --config-file pyproject.toml
mypy vllm/transformers_utils --config-file pyproject.toml
Expand Down
1 change: 1 addition & 0 deletions requirements-common.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ aiohttp
openai
uvicorn[standard]
pydantic >= 2.0 # Required for OpenAI server.
pillow # Required for image processing
prometheus_client >= 0.18.0
prometheus-fastapi-instrumentator >= 7.0.0
tiktoken >= 0.6.0 # Required for DBRX tokenizer
Expand Down
3 changes: 0 additions & 3 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,5 @@ sentence-transformers # required for embedding
# Benchmarking
aiohttp

# Multimodal
pillow

# quantization
bitsandbytes==0.42.0
45 changes: 24 additions & 21 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@
from vllm.distributed import destroy_model_parallel
from vllm.inputs import TextPrompt
from vllm.logger import init_logger
from vllm.sequence import MultiModalData, SampleLogprobs
from vllm.multimodal import MultiModalData
from vllm.multimodal.image import ImageFeatureData, ImagePixelData
from vllm.sequence import SampleLogprobs

logger = init_logger(__name__)

Expand All @@ -24,6 +26,7 @@
_LONG_PROMPTS = [os.path.join(_TEST_DIR, "prompts", "summary.txt")]

# Multi modal related
# You can use `.buildkite/download-images.sh` to download the assets
_PIXEL_VALUES_FILES = [
os.path.join(_TEST_DIR, "images", filename) for filename in
["stop_sign_pixel_values.pt", "cherry_blossom_pixel_values.pt"]
Expand Down Expand Up @@ -89,17 +92,23 @@ def hf_images() -> List[Image.Image]:


@pytest.fixture()
def vllm_images(request) -> "torch.Tensor":
def vllm_images(request) -> List[MultiModalData]:
vision_language_config = request.getfixturevalue("model_and_config")[1]
all_images = []
if vision_language_config.image_input_type == (
VisionLanguageConfig.ImageInputType.IMAGE_FEATURES):
filenames = _IMAGE_FEATURES_FILES
return [
ImageFeatureData(torch.load(filename))
for filename in _IMAGE_FEATURES_FILES
]
else:
filenames = _PIXEL_VALUES_FILES
for filename in filenames:
all_images.append(torch.load(filename))
return torch.concat(all_images, dim=0)
return [
ImagePixelData(Image.open(filename)) for filename in _IMAGE_FILES
]


@pytest.fixture()
def vllm_image_tensors(request) -> List[torch.Tensor]:
return [torch.load(filename) for filename in _PIXEL_VALUES_FILES]


@pytest.fixture()
Expand Down Expand Up @@ -392,23 +401,17 @@ def generate(
self,
prompts: List[str],
sampling_params: SamplingParams,
images: Optional[torch.Tensor] = None,
images: Optional[List[MultiModalData]] = None,
) -> List[Tuple[List[List[int]], List[str]]]:
if images is not None:
assert len(prompts) == len(images)

prompt_inputs: List[TextPrompt] = []
for i, prompt in enumerate(prompts):
prompt = TextPrompt(prompt=prompt)
if images is not None:
prompt["multi_modal_data"] = MultiModalData(
type=MultiModalData.Type.IMAGE,
data=images[i:i + 1],
)

prompt_inputs.append(prompt)
inputs = [TextPrompt(prompt=prompt) for prompt in prompts]
if images is not None:
for i, image in enumerate(images):
inputs[i]["multi_modal_data"] = image

req_outputs = self.model.generate(prompt_inputs,
req_outputs = self.model.generate(inputs,
sampling_params=sampling_params)

outputs: List[Tuple[List[List[int]], List[str]]] = []
Expand Down Expand Up @@ -447,7 +450,7 @@ def generate_greedy(
self,
prompts: List[str],
max_tokens: int,
images: Optional[torch.Tensor] = None,
images: Optional[List[MultiModalData]] = None,
) -> List[Tuple[List[int], str]]:
greedy_params = SamplingParams(temperature=0.0, max_tokens=max_tokens)
outputs = self.generate(prompts, greedy_params, images=images)
Expand Down
Loading
Loading