[Core] Support image processor #4197

DarkLight1337 · 2024-04-19T08:45:24Z

I have implemented a plugin architecture (MultiModelPlugin) over MultiModalData to define how each modality type should be preprocessed before being passed to the model as keyword arguments. This preserves the contract between the output of HuggingFace processor and the input into the HuggingFace model. As long as those keyword arguments do not conflict with the ones we have in vLLM, I think this is a good way to make the framework flexible enough to support other multi-modal architectures.

FIX #4054 (the data is now automatically converted into the model's device)

Related Contributions

This PR is part of #3978.

This PR also implements Proposals 1 and 3 of #4194.

Features

Refactor of MultiModalData
- Image input is now split into two subclasses:
  - ImageFeatureData represents the image features of LLaVA after being passed through the vision tower, but before the multi-modal projection is applied.
  - ImagePixelData represents the raw image (using PIL.Image class). AutoImageProcessor from HuggingFace is loaded from config.json to pre-process input images before being passed to the model as pixel_values. As with the tokenizer, you can override the default one and specify the version of image processor via EngineConfig; you can even disable image preprocessing altogether, which is useful if you want to pass in images that have already been preprocessed.
- The LLaVA implementation has been updated accordingly to accept the new inputs.
A new documentation page for using VLMs can be found under (dev/multimodal).

Compatibility Changes

pillow will be upgraded to a common dependency (from dev) to process the images.

- Also add docs for basic VLM usage

- Other data types may need to be of different dtype from that of the model

DarkLight1337 · 2024-04-19T15:51:05Z

The LLaVA test passes on my end (with both outputs matching the HF output shown in CI). Does anyone have a clue what might cause it to fail in CI? Perhaps a case of floating-point error in GPU computation?

…mmy data

ywang96 · 2024-05-31T07:46:09Z

Per offline discussion - waiting for #5118 to be merged first.

ywang96 · 2024-06-03T01:01:40Z

@DarkLight1337 Could you resolve the merge conflicts? Once that's done I think this PR is ready to merge.

ywang96

I did a final pass and left a note, but everything LGTM! Thank you for the hard work on this. @DarkLight1337

vllm/model_executor/models/llava.py

This was referenced Apr 19, 2024

[Core][Frontend][Doc] Initial support for LLaVA-NeXT and GPT-4V Chat Completions API #3978

Closed

[Model] Initial support for LLaVA-NeXT #4199

Merged

[Frontend] Support GPT-4V Chat Completions API #4200

Closed

Support image processor

a26badd

- Also add docs for basic VLM usage

DarkLight1337 force-pushed the mm-data-processor branch from 4fc1801 to a26badd Compare April 19, 2024 10:02

DarkLight1337 mentioned this pull request Apr 19, 2024

[RFC]: Multi-modality Support Refactoring #4194

Open

55 tasks

Convert dtype in multi modal processing

1a0ecca

- Other data types may need to be of different dtype from that of the model

DarkLight1337 added 4 commits April 22, 2024 03:56

Move MultiModalData to new subpackage multimodal

45b6756

Add multi-modal processor registry

6ed8397

Initialize the processor only once

8c48208

Merge branch 'upstream' into mm-data-processor

613ec1b

DarkLight1337 force-pushed the mm-data-processor branch 3 times, most recently from a92952e to 2d57f27 Compare April 22, 2024 11:40

Move processor to model runner

c48a7d4

DarkLight1337 force-pushed the mm-data-processor branch 7 times, most recently from acc378d to b60e5f8 Compare April 22, 2024 15:01

Refactor registry to plugin pattern in order to support specifying du…

3232231

…mmy data

DarkLight1337 force-pushed the mm-data-processor branch from b60e5f8 to 3232231 Compare April 23, 2024 00:52

DarkLight1337 added 5 commits April 23, 2024 10:56

Merge branch 'upstream' into mm-data-processor

92a0283

Combine prompt inputs

5d42800

Fix a bunch of tests

5db2c5e

Fix LLaVA test

74c5905

Merge branch 'upstream' into llm-inputs

cd8917b

DarkLight1337 added 2 commits May 31, 2024 07:17

Trigger pipeline

d78d456

Trigger pipeline 2

243eb90

DarkLight1337 mentioned this pull request May 31, 2024

[Model] Add moondream vision language model #4228

Open

Fix formatting [skip ci]

501b11c

DarkLight1337 force-pushed the mm-data-processor branch from 6d00aed to 501b11c Compare June 1, 2024 02:45

Merge branch 'upstream' into mm-data-processor

3d20f6d

DarkLight1337 force-pushed the mm-data-processor branch from da681b5 to 3d20f6d Compare June 3, 2024 01:33

Merge branch 'upstream' into mm-data-processor

680cee9

ywang96 approved these changes Jun 3, 2024

View reviewed changes

vllm/model_executor/models/llava.py Show resolved Hide resolved

ywang96 enabled auto-merge (squash) June 3, 2024 03:25

ywang96 mentioned this pull request Jun 3, 2024

[Model] Adding support for MiniCPM-V #4087

Merged

zhuohan123 disabled auto-merge June 3, 2024 05:56

zhuohan123 merged commit 7a64d24 into vllm-project:main Jun 3, 2024
63 of 65 checks passed

DarkLight1337 deleted the mm-data-processor branch June 3, 2024 05:58

This was referenced Jun 3, 2024

[Core] Registry for processing model inputs #5214

Merged

[CI/Build] Add inputs tests #5215

Merged

[Misc]: Can we remove vllm/entrypoints/api_server.py? #3852

Open

DarkLight1337 mentioned this pull request Jun 5, 2024

[Core] Dynamic image size support for VLMs #5276

Merged

blinkbear pushed a commit to blinkbear/vllm that referenced this pull request Jun 6, 2024

[Core] Support image processor (vllm-project#4197)

91e11aa

robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Jun 11, 2024

[Core] Support image processor (vllm-project#4197)

c070e44

joerunde pushed a commit to joerunde/vllm that referenced this pull request Jun 17, 2024

[Core] Support image processor (vllm-project#4197)

b1deaf3

DarkLight1337 mentioned this pull request Jun 25, 2024

[Feature]: Phi-3 vision -- allow multiple images as Microsoft shows can be done #5820

Closed

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 27, 2024

[Core] Support image processor (vllm-project#4197)

9c8c987

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 8, 2024

[Core] Support image processor (vllm-project#4197)

77c21c7

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[Core] Support image processor (vllm-project#4197)

5965dba

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

[Core] Support image processor (vllm-project#4197)

076ec38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Support image processor #4197

[Core] Support image processor #4197

DarkLight1337 commented Apr 19, 2024 •

edited

Loading

DarkLight1337 commented Apr 19, 2024 •

edited

Loading

ywang96 commented May 31, 2024

ywang96 commented Jun 3, 2024

ywang96 left a comment

[Core] Support image processor #4197

[Core] Support image processor #4197

Conversation

DarkLight1337 commented Apr 19, 2024 • edited Loading

Related Contributions

Features

Compatibility Changes

DarkLight1337 commented Apr 19, 2024 • edited Loading

ywang96 commented May 31, 2024

ywang96 commented Jun 3, 2024

ywang96 left a comment

Choose a reason for hiding this comment

DarkLight1337 commented Apr 19, 2024 •

edited

Loading

DarkLight1337 commented Apr 19, 2024 •

edited

Loading