New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Core] Add engine option to return only deltas or final output #7381

Merged

simon-mo merged 22 commits into vllm-project:main from njhill:reduce-output

Sep 12, 2024

Commits on Aug 13, 2024

[Core] Add engine option to return only deltas or final output

The LLMEngine and AsyncLLMEngine APIs will currently return/stream cumulative outputs for all sequences at every step.

This is more data than needed for LLM.generate or the OpenAI server APIs:
- For LLM.generate and non-streaming APIs we only need the final output
- For streaming APIs we only require deltas

This PR adds an `output_kind` parameter to SamplingParams with an enum value of either CUMULATIVE, DELTA, or FINAL_ONLY.

It will reduce the number of objects that need to be constructed at each step, and the amount of data to be serialized to return to the newly-decoupled front-end API process.

njhill committed Aug 13, 2024

1eb9991

Fixes

njhill committed Aug 13, 2024
Configuration menu
View commit details

Copy full SHA for 9bc3fdd

Browse repository at this point
Copy the full SHA

9bc3fdd View commit details

Browse the repository at this point in the history
Fix ignored sequence case

njhill committed Aug 13, 2024
Configuration menu
View commit details

Copy full SHA for ef2e59f

Browse repository at this point
Copy the full SHA

ef2e59f View commit details

Browse the repository at this point in the history
Also exclude prompt details in subsequent outputs in delta mode

njhill committed Aug 13, 2024
Configuration menu
View commit details

Copy full SHA for dc1f3f2

Browse repository at this point
Copy the full SHA

dc1f3f2 View commit details

Browse the repository at this point in the history
Fix prompt token counts in streaming cases

njhill committed Aug 13, 2024
Configuration menu
View commit details

Copy full SHA for 9d35a00

Browse repository at this point
Copy the full SHA

9d35a00 View commit details

Browse the repository at this point in the history

Commits on Aug 14, 2024

Simplification suggestion from @joerunde

njhill committed Aug 14, 2024
Configuration menu
View commit details

Copy full SHA for b7ff44e

Browse repository at this point
Copy the full SHA

b7ff44e View commit details

Browse the repository at this point in the history

Commits on Aug 15, 2024

Make tests more robust

njhill committed Aug 15, 2024
Configuration menu
View commit details

Copy full SHA for 34df9bd

Browse repository at this point
Copy the full SHA

34df9bd View commit details

Browse the repository at this point in the history
Merge remote-tracking branch 'origin/main' into reduce-output

njhill committed Aug 15, 2024
Configuration menu
View commit details

Copy full SHA for a68506f

Browse repository at this point
Copy the full SHA

a68506f View commit details

Browse the repository at this point in the history

Commits on Aug 27, 2024

Merge remote-tracking branch 'origin/main' into reduce-output
```
# Conflicts:
#	tests/entrypoints/openai/test_chat.py
#	vllm/engine/llm_engine.py
#	vllm/entrypoints/llm.py
#	vllm/entrypoints/openai/protocol.py
#	vllm/entrypoints/openai/serving_completion.py
#	vllm/sampling_params.py
```
njhill committed Aug 27, 2024
Configuration menu
View commit details

Copy full SHA for cfe7118

Browse repository at this point
Copy the full SHA

cfe7118 View commit details

Browse the repository at this point in the history
Post-merge wip

njhill committed Aug 27, 2024
Configuration menu
View commit details

Copy full SHA for 45fd069

Browse repository at this point
Copy the full SHA

45fd069 View commit details

Browse the repository at this point in the history

Commits on Sep 8, 2024

Merge remote-tracking branch 'origin/main' into reduce-output
```
# Conflicts:
#	vllm/engine/llm_engine.py
#	vllm/entrypoints/openai/serving_chat.py
```
njhill committed Sep 8, 2024
Configuration menu
View commit details

Copy full SHA for 3f21ad6

Browse repository at this point
Copy the full SHA

3f21ad6 View commit details

Browse the repository at this point in the history
Merge remote-tracking branch 'origin/main' into reduce-output
```
# Conflicts:
#	vllm/engine/llm_engine.py
```
njhill committed Sep 8, 2024
Configuration menu
View commit details

Copy full SHA for d59ffd1

Browse repository at this point
Copy the full SHA

d59ffd1 View commit details

Browse the repository at this point in the history

Commits on Sep 10, 2024

Merge remote-tracking branch 'origin/main' into reduce-output

njhill committed Sep 10, 2024
Configuration menu
View commit details

Copy full SHA for d2f36dd

Browse repository at this point
Copy the full SHA

d2f36dd View commit details

Browse the repository at this point in the history
Fix delta computation, remove unrelated changes

njhill committed Sep 10, 2024
Configuration menu
View commit details

Copy full SHA for 2843365

Browse repository at this point
Copy the full SHA

2843365 View commit details

Browse the repository at this point in the history
Merge remote-tracking branch 'origin/main' into reduce-output
```
# Conflicts:
#	vllm/entrypoints/llm.py
```
njhill committed Sep 10, 2024
Configuration menu
View commit details

Copy full SHA for 2736ab1

Browse repository at this point
Copy the full SHA

2736ab1 View commit details

Browse the repository at this point in the history
Address Alex's comments, fix include_prompt logic
```
Also avoid appending delta token ids to sequences in cases they aren't needed.
```
njhill committed Sep 10, 2024
Configuration menu
View commit details

Copy full SHA for a045dff

Browse repository at this point
Copy the full SHA

a045dff View commit details

Browse the repository at this point in the history

Commits on Sep 11, 2024

Merge remote-tracking branch 'origin/main' into reduce-output
```
# Conflicts:
#	vllm/sequence.py
```
njhill committed Sep 11, 2024
Configuration menu
View commit details

Copy full SHA for 58f6112

Browse repository at this point
Copy the full SHA

58f6112 View commit details

Browse the repository at this point in the history
Add tests

njhill committed Sep 11, 2024
Configuration menu
View commit details

Copy full SHA for e7a2b55

Browse repository at this point
Copy the full SHA

e7a2b55 View commit details

Browse the repository at this point in the history

Commits on Sep 12, 2024

Some rework/simplification

njhill committed Sep 12, 2024
Configuration menu
View commit details

Copy full SHA for 6b1f355

Browse repository at this point
Copy the full SHA

6b1f355 View commit details

Browse the repository at this point in the history
Remove obsolete engine.step_return_finished_only field

njhill committed Sep 12, 2024
Configuration menu
View commit details

Copy full SHA for 3233a92

Browse repository at this point
Copy the full SHA

3233a92 View commit details

Browse the repository at this point in the history
Merge remote-tracking branch 'origin/main' into reduce-output
```
# Conflicts:
#	tests/async_engine/test_async_llm_engine.py
```
njhill committed Sep 12, 2024
Configuration menu
View commit details

Copy full SHA for f351ed2

Browse repository at this point
Copy the full SHA

f351ed2 View commit details

Browse the repository at this point in the history
Merge remote-tracking branch 'origin/main' into reduce-output

njhill committed Sep 12, 2024
Configuration menu
View commit details

Copy full SHA for 75814bd

Browse repository at this point
Copy the full SHA

75814bd View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Add engine option to return only deltas or final output #7381

[Core] Add engine option to return only deltas or final output #7381

Commits on Aug 13, 2024

Commits on Aug 14, 2024

Commits on Aug 15, 2024

Commits on Aug 27, 2024

Commits on Sep 8, 2024

Commits on Sep 10, 2024

Commits on Sep 11, 2024

Commits on Sep 12, 2024