You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Oct 11, 2024. It is now read-only.
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Copy file name to clipboardexpand all lines: docs/source/models/vlm.rst
+7-4
Original file line number
Diff line number
Diff line change
@@ -36,7 +36,6 @@ To initialize a VLM, the aforementioned arguments must be passed to the ``LLM``
36
36
37
37
llm = LLM(
38
38
model="llava-hf/llava-1.5-7b-hf",
39
-
image_input_type="pixel_values",
40
39
image_token_id=32000,
41
40
image_input_shape="1,3,336,336",
42
41
image_feature_size=576,
@@ -49,7 +48,12 @@ To initialize a VLM, the aforementioned arguments must be passed to the ``LLM``
49
48
To pass an image to the model, note the following in :class:`vllm.inputs.PromptStrictInputs`:
50
49
51
50
* ``prompt``: The prompt should have a number of ``<image>`` tokens equal to ``image_feature_size``.
52
-
* ``multi_modal_data``: This should be an instance of :class:`~vllm.multimodal.image.ImagePixelData` or :class:`~vllm.multimodal.image.ImageFeatureData`.
51
+
* ``multi_modal_data``: This is a dictionary that follows the schema defined in :class:`vllm.multimodal.MultiModalDataDict`.
52
+
53
+
.. note::
54
+
55
+
``multi_modal_data`` can accept keys and values beyond the builtin ones, as long as a customized plugin is registered through
56
+
:class:`vllm.multimodal.MULTIMODAL_REGISTRY`.
53
57
54
58
.. code-block:: python
55
59
@@ -61,7 +65,7 @@ To pass an image to the model, note the following in :class:`vllm.inputs.PromptS
61
65
62
66
outputs = llm.generate({
63
67
"prompt": prompt,
64
-
"multi_modal_data": ImagePixelData(image),
68
+
"multi_modal_data": {"image": image},
65
69
})
66
70
67
71
for o in outputs:
@@ -93,7 +97,6 @@ Below is an example on how to launch the same ``llava-hf/llava-1.5-7b-hf`` with
0 commit comments