New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Core][Frontend][Doc] Initial support for LLaVA-NeXT and GPT-4V Chat Completions API #3978

Closed

DarkLight1337 wants to merge 60 commits into vllm-project:main from DarkLight1337:openai-vision-api

Commits on Apr 10, 2024

Add basic support for OpenAI image input API
```
- Refactor `OpenAIServingChat` and add function for loading image
- Move `pillow` dev dependency to common
- Add example chat template for LLaVA model
```
DarkLight1337 committed Apr 10, 2024
Configuration menu
View commit details

Copy full SHA for 874a581

Browse repository at this point
Copy the full SHA

874a581 View commit details

Browse the repository at this point in the history
Update documentation
```
- Add general guide for using VLMs
- Add LLavA to list of supported models
```
DarkLight1337 committed Apr 10, 2024
Configuration menu
View commit details

Copy full SHA for 607434e

Browse repository at this point
Copy the full SHA

607434e View commit details

Browse the repository at this point in the history
Add tests for OpenAI image input API and image loader
```
- Move `ServerRunner` to common file
```
DarkLight1337 committed Apr 10, 2024
Configuration menu
View commit details

Copy full SHA for aaa6bfe

Browse repository at this point
Copy the full SHA

aaa6bfe View commit details

Browse the repository at this point in the history

Commits on Apr 11, 2024

Merge branch 'upstream' into openai-vision-api

DarkLight1337 committed Apr 11, 2024
Configuration menu
View commit details

Copy full SHA for 26e7b2a

Browse repository at this point
Copy the full SHA

26e7b2a View commit details

Browse the repository at this point in the history
Apply formatter

DarkLight1337 committed Apr 11, 2024
Configuration menu
View commit details

Copy full SHA for 44829b5

Browse repository at this point
Copy the full SHA

44829b5 View commit details

Browse the repository at this point in the history
Place image before text for llava-hf model

DarkLight1337 committed Apr 11, 2024
Configuration menu
View commit details

Copy full SHA for bccb367

Browse repository at this point
Copy the full SHA

bccb367 View commit details

Browse the repository at this point in the history
Internally enable customization of merging image with text prompt

DarkLight1337 committed Apr 11, 2024
Configuration menu
View commit details

Copy full SHA for b9302e8

Browse repository at this point
Copy the full SHA

b9302e8 View commit details

Browse the repository at this point in the history
Fix errors in CI/CD
```
- Incorrect loading of config (also rename `openai_api` to `image_openai`)
- Incorrect await of stream generator
```
DarkLight1337 committed Apr 11, 2024
Configuration menu
View commit details

Copy full SHA for a44d7d1

Browse repository at this point
Copy the full SHA

a44d7d1 View commit details

Browse the repository at this point in the history

Commits on Apr 12, 2024

Merge branch 'upstream' into openai-vision-api

DarkLight1337 committed Apr 12, 2024
Configuration menu
View commit details

Copy full SHA for 561ad49

Browse repository at this point
Copy the full SHA

561ad49 View commit details

Browse the repository at this point in the history
Fix some type errors along the way

DarkLight1337 committed Apr 12, 2024
Configuration menu
View commit details

Copy full SHA for 4479605

Browse repository at this point
Copy the full SHA

4479605 View commit details

Browse the repository at this point in the history
Improve async behaviour of loading images
```
- Also, use the type definitions from `openai` directly
```
DarkLight1337 committed Apr 12, 2024
Configuration menu
View commit details

Copy full SHA for 20852d9

Browse repository at this point
Copy the full SHA

20852d9 View commit details

Browse the repository at this point in the history
Use discriminated union in prompt parsing

DarkLight1337 committed Apr 12, 2024
Configuration menu
View commit details

Copy full SHA for ce770f4

Browse repository at this point
Copy the full SHA

ce770f4 View commit details

Browse the repository at this point in the history
Fix some type errors along the way

DarkLight1337 committed Apr 12, 2024
Configuration menu
View commit details

Copy full SHA for 6b016bc

Browse repository at this point
Copy the full SHA

6b016bc View commit details

Browse the repository at this point in the history
Some more fixes

DarkLight1337 committed Apr 12, 2024
Configuration menu
View commit details

Copy full SHA for 7620354

Browse repository at this point
Copy the full SHA

7620354 View commit details

Browse the repository at this point in the history
Apply formatter

DarkLight1337 committed Apr 12, 2024
Configuration menu
View commit details

Copy full SHA for 7c3e6d9

Browse repository at this point
Copy the full SHA

7c3e6d9 View commit details

Browse the repository at this point in the history
Merge branch 'upstream' into openai-vision-api

DarkLight1337 committed Apr 12, 2024
Configuration menu
View commit details

Copy full SHA for e74b0a7

Browse repository at this point
Copy the full SHA

e74b0a7 View commit details

Browse the repository at this point in the history
Move openai to common requirements

DarkLight1337 committed Apr 12, 2024
Configuration menu
View commit details

Copy full SHA for 9925dcb

Browse repository at this point
Copy the full SHA

9925dcb View commit details

Browse the repository at this point in the history
Fix typo in _parse_chat_message_image_input

DarkLight1337 committed Apr 12, 2024
Configuration menu
View commit details

Copy full SHA for ceb4e35

Browse repository at this point
Copy the full SHA

ceb4e35 View commit details

Browse the repository at this point in the history
Refactor prompt parsing so that it can be shared between Chat Complet…
```
…ions API and legacy Completions API
```
DarkLight1337 committed Apr 12, 2024
Configuration menu
View commit details

Copy full SHA for 7bdc84e

Browse repository at this point
Copy the full SHA

7bdc84e View commit details

Browse the repository at this point in the history
Make code more readable

DarkLight1337 committed Apr 12, 2024
Configuration menu
View commit details

Copy full SHA for a7d1098

Browse repository at this point
Copy the full SHA

a7d1098 View commit details

Browse the repository at this point in the history
Move assertion to a more appropriate place

DarkLight1337 committed Apr 12, 2024
Configuration menu
View commit details

Copy full SHA for 8b9d636

Browse repository at this point
Copy the full SHA

8b9d636 View commit details

Browse the repository at this point in the history
Merge branch 'openai-typing' into openai-vision-api

DarkLight1337 committed Apr 12, 2024
Configuration menu
View commit details

Copy full SHA for 9754142

Browse repository at this point
Copy the full SHA

9754142 View commit details

Browse the repository at this point in the history
Add code documentation

DarkLight1337 committed Apr 12, 2024
Configuration menu
View commit details

Copy full SHA for c48c13a

Browse repository at this point
Copy the full SHA

c48c13a View commit details

Browse the repository at this point in the history
Decompose _validate_prompt_and_tokenize

DarkLight1337 committed Apr 12, 2024
Configuration menu
View commit details

Copy full SHA for 3530362

Browse repository at this point
Copy the full SHA

3530362 View commit details

Browse the repository at this point in the history
Fix missing import due to renaming

DarkLight1337 authored Apr 12, 2024
Configuration menu
View commit details

Copy full SHA for b8feec9

Browse repository at this point
Copy the full SHA

b8feec9 View commit details

Browse the repository at this point in the history
Merge branch 'openai-typing' into openai-vision-api

DarkLight1337 committed Apr 12, 2024
Configuration menu
View commit details

Copy full SHA for 9cae113

Browse repository at this point
Copy the full SHA

9cae113 View commit details

Browse the repository at this point in the history

Commits on Apr 13, 2024

Merge branch 'upstream' into openai-typing

DarkLight1337 committed Apr 13, 2024
Configuration menu
View commit details

Copy full SHA for 89d9086

Browse repository at this point
Copy the full SHA

89d9086 View commit details

Browse the repository at this point in the history
Fix bug when parsing array of tokens

DarkLight1337 committed Apr 13, 2024
Configuration menu
View commit details

Copy full SHA for cc1a5b3

Browse repository at this point
Copy the full SHA

cc1a5b3 View commit details

Browse the repository at this point in the history
Add token array to batch completions testing

DarkLight1337 committed Apr 13, 2024
Configuration menu
View commit details

Copy full SHA for f9c1135

Browse repository at this point
Copy the full SHA

f9c1135 View commit details

Browse the repository at this point in the history

Commits on Apr 14, 2024

Merge branch 'openai-typing' into openai-vision-api

DarkLight1337 committed Apr 14, 2024
Configuration menu
View commit details

Copy full SHA for ecc2d50

Browse repository at this point
Copy the full SHA

ecc2d50 View commit details

Browse the repository at this point in the history
Replace legacy conint with Annotated field

DarkLight1337 committed Apr 14, 2024
Configuration menu
View commit details

Copy full SHA for f2e8180

Browse repository at this point
Copy the full SHA

f2e8180 View commit details

Browse the repository at this point in the history
Merge branch 'openai-typing' into openai-vision-api

DarkLight1337 committed Apr 14, 2024
Configuration menu
View commit details

Copy full SHA for ce04842

Browse repository at this point
Copy the full SHA

ce04842 View commit details

Browse the repository at this point in the history
Load image processor from HuggingFace
```
- Note that multi modal processing logic has been moved from `LLM` to `LLMEngine`
```
DarkLight1337 committed Apr 14, 2024
Configuration menu
View commit details

Copy full SHA for cdbf08a

Browse repository at this point
Copy the full SHA

cdbf08a View commit details

Browse the repository at this point in the history
Merge branch 'upstream' into openai-vision-api

DarkLight1337 committed Apr 14, 2024
Configuration menu
View commit details

Copy full SHA for 9a336ec

Browse repository at this point
Copy the full SHA

9a336ec View commit details

Browse the repository at this point in the history
Allow disabling image processor
```
- Also fix missing arguments to config in `test_llava.py`
```
DarkLight1337 committed Apr 14, 2024
Configuration menu
View commit details

Copy full SHA for 5722dd8

Browse repository at this point
Copy the full SHA

5722dd8 View commit details

Browse the repository at this point in the history

Commits on Apr 15, 2024

Fix errors when running the example and tests

DarkLight1337 committed Apr 15, 2024
Configuration menu
View commit details

Copy full SHA for 6e1fa67

Browse repository at this point
Copy the full SHA

6e1fa67 View commit details

Browse the repository at this point in the history
Merge branch 'upstream' into openai-vision-api

DarkLight1337 committed Apr 15, 2024
Configuration menu
View commit details

Copy full SHA for 7ce44da

Browse repository at this point
Copy the full SHA

7ce44da View commit details

Browse the repository at this point in the history

Commits on Apr 16, 2024

Merge branch 'upstream' into openai-vision-api

DarkLight1337 committed Apr 16, 2024
Configuration menu
View commit details

Copy full SHA for 9804604

Browse repository at this point
Copy the full SHA

9804604 View commit details

Browse the repository at this point in the history
Add test for loading image processor by revision

DarkLight1337 committed Apr 16, 2024
Configuration menu
View commit details

Copy full SHA for 21434df

Browse repository at this point
Copy the full SHA

21434df View commit details

Browse the repository at this point in the history
Temporary patch for llava-1.5-13b to facilitate testing

DarkLight1337 committed Apr 16, 2024
Configuration menu
View commit details

Copy full SHA for a5907b0

Browse repository at this point
Copy the full SHA

a5907b0 View commit details

Browse the repository at this point in the history

Commits on Apr 17, 2024

Merge branch 'upstream' into openai-vision-api

DarkLight1337 committed Apr 17, 2024
Configuration menu
View commit details

Copy full SHA for f08ff10

Browse repository at this point
Copy the full SHA

f08ff10 View commit details

Browse the repository at this point in the history
Fix issue with pickling config when serving LLaVA with multiple GPUs

DarkLight1337 committed Apr 17, 2024
Configuration menu
View commit details

Copy full SHA for c126646

Browse repository at this point
Copy the full SHA

c126646 View commit details

Browse the repository at this point in the history

Commits on Apr 18, 2024

Merge branch 'upstream' into openai-vision-api

DarkLight1337 committed Apr 18, 2024
Configuration menu
View commit details

Copy full SHA for 49ba216

Browse repository at this point
Copy the full SHA

49ba216 View commit details

Browse the repository at this point in the history
Add TODO to test

DarkLight1337 committed Apr 18, 2024
Configuration menu
View commit details

Copy full SHA for 11e9921

Browse repository at this point
Copy the full SHA

11e9921 View commit details

Browse the repository at this point in the history
Try to avoid OOM by using --enforce-eager

DarkLight1337 committed Apr 18, 2024
Configuration menu
View commit details

Copy full SHA for 7ae80a2

Browse repository at this point
Copy the full SHA

7ae80a2 View commit details

Browse the repository at this point in the history
Reduce number of models to test to avoid OOM

DarkLight1337 committed Apr 18, 2024
Configuration menu
View commit details

Copy full SHA for 2610bea

Browse repository at this point
Copy the full SHA

2610bea View commit details

Browse the repository at this point in the history
Try testing 13b model only

DarkLight1337 committed Apr 18, 2024
Configuration menu
View commit details

Copy full SHA for 5ad2b67

Browse repository at this point
Copy the full SHA

5ad2b67 View commit details

Browse the repository at this point in the history

Refactor image processing, MultiModalData and LLaVA model

- Remove channel conversion and resizing from OpenAI server preprocessing since the image processor in HuggingFace should be able to handle that
- `MultiModalData` is now an abstract class that outputs additional kwargs to be input into the model. This was intially done to support LLaVA-NeXT's `image_size` parameter but can be extended to other models as well.
- The application of image processor is now defined inside `MultiModalData` so that there is no need to extensively edit the engine to support other types of data
- New `MultiModalData` subclasses: `ImagePixelData` and `ImageFeatureData` to better differentiate the two cases of image input
- Refactored LLaVA-1.5 model to make it easier to inherit for defining LLaVA-NeXT model

DarkLight1337 committed Apr 18, 2024

696357b

Fix image processing not working directly, due to tensor being passed
```
- Now, `ImagePixelData` only accepts `PIL.Image` input
- Also move `torch` import out of `TYPE_CHECKING` as it is loaded anyways when importing `SamplingParams`
```
DarkLight1337 committed Apr 18, 2024
Configuration menu
View commit details

Copy full SHA for 483b190

Browse repository at this point
Copy the full SHA

483b190 View commit details

Browse the repository at this point in the history
Merge branch 'upstream' into openai-vision-api

DarkLight1337 committed Apr 18, 2024
Configuration menu
View commit details

Copy full SHA for 3e22017

Browse repository at this point
Copy the full SHA

3e22017 View commit details

Browse the repository at this point in the history
Revert to using 7b model in testing

DarkLight1337 committed Apr 18, 2024
Configuration menu
View commit details

Copy full SHA for 0b6af35

Browse repository at this point
Copy the full SHA

0b6af35 View commit details

Browse the repository at this point in the history
Get LLaVA-Next to work with fixed-size images
```
- Note the patch in `ImagePixelData`. To fully leverage the potential of LLaVA-Next, we should allow image of any size, but the feature size would then be variable.
```
DarkLight1337 committed Apr 18, 2024
Configuration menu
View commit details

Copy full SHA for e4c3502

Browse repository at this point
Copy the full SHA

e4c3502 View commit details

Browse the repository at this point in the history
Apply formatter and fix typo

DarkLight1337 committed Apr 18, 2024
Configuration menu
View commit details

Copy full SHA for 21aaf3d

Browse repository at this point
Copy the full SHA

21aaf3d View commit details

Browse the repository at this point in the history
Fix input shape not being based on config value

DarkLight1337 committed Apr 18, 2024
Configuration menu
View commit details

Copy full SHA for ac95b79

Browse repository at this point
Copy the full SHA

ac95b79 View commit details

Browse the repository at this point in the history
Allow config to specify other image size for LLaVA-NeXT

DarkLight1337 committed Apr 18, 2024
Configuration menu
View commit details

Copy full SHA for 9a9a4e7

Browse repository at this point
Copy the full SHA

9a9a4e7 View commit details

Browse the repository at this point in the history
Improve error message to show the expected image_feature_size

DarkLight1337 committed Apr 18, 2024
Configuration menu
View commit details

Copy full SHA for 176ad2c

Browse repository at this point
Copy the full SHA

176ad2c View commit details

Browse the repository at this point in the history

Commits on Apr 19, 2024

Fix dtype mismatch in multi_modal_kwargs

DarkLight1337 committed Apr 19, 2024
Configuration menu
View commit details

Copy full SHA for 91ea044

Browse repository at this point
Copy the full SHA

91ea044 View commit details

Browse the repository at this point in the history
Fix LLaVA example and test w.r.t. image processing refactor
```
- Note that we now load the images directly instead of from `.pt` files
```
DarkLight1337 committed Apr 19, 2024
Configuration menu
View commit details

Copy full SHA for cb19743

Browse repository at this point
Copy the full SHA

cb19743 View commit details

Browse the repository at this point in the history
Merge branch 'upstream' into openai-vision-api

DarkLight1337 committed Apr 19, 2024
Configuration menu
View commit details

Copy full SHA for 019f473

Browse repository at this point
Copy the full SHA

019f473 View commit details

Browse the repository at this point in the history
Fix circular import and set return type
```
- These changes are propagated to the child PRs
```
DarkLight1337 committed Apr 19, 2024
Configuration menu
View commit details

Copy full SHA for f882d99

Browse repository at this point
Copy the full SHA

f882d99 View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core][Frontend][Doc] Initial support for LLaVA-NeXT and GPT-4V Chat Completions API #3978

[Core][Frontend][Doc] Initial support for LLaVA-NeXT and GPT-4V Chat Completions API #3978

Commits on Apr 10, 2024

Commits on Apr 11, 2024

Commits on Apr 12, 2024

Commits on Apr 13, 2024

Commits on Apr 14, 2024

Commits on Apr 15, 2024

Commits on Apr 16, 2024

Commits on Apr 17, 2024

Commits on Apr 18, 2024

Commits on Apr 19, 2024