-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Add engine option to return only deltas or final output #7381
Commits on Aug 13, 2024
-
[Core] Add engine option to return only deltas or final output
The LLMEngine and AsyncLLMEngine APIs will currently return/stream cumulative outputs for all sequences at every step. This is more data than needed for LLM.generate or the OpenAI server APIs: - For LLM.generate and non-streaming APIs we only need the final output - For streaming APIs we only require deltas This PR adds an `output_kind` parameter to SamplingParams with an enum value of either CUMULATIVE, DELTA, or FINAL_ONLY. It will reduce the number of objects that need to be constructed at each step, and the amount of data to be serialized to return to the newly-decoupled front-end API process.
Configuration menu - View commit details
-
Copy full SHA for 1eb9991 - Browse repository at this point
Copy the full SHA 1eb9991View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9bc3fdd - Browse repository at this point
Copy the full SHA 9bc3fddView commit details -
Configuration menu - View commit details
-
Copy full SHA for ef2e59f - Browse repository at this point
Copy the full SHA ef2e59fView commit details -
Configuration menu - View commit details
-
Copy full SHA for dc1f3f2 - Browse repository at this point
Copy the full SHA dc1f3f2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9d35a00 - Browse repository at this point
Copy the full SHA 9d35a00View commit details
Commits on Aug 14, 2024
-
Configuration menu - View commit details
-
Copy full SHA for b7ff44e - Browse repository at this point
Copy the full SHA b7ff44eView commit details
Commits on Aug 15, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 34df9bd - Browse repository at this point
Copy the full SHA 34df9bdView commit details -
Configuration menu - View commit details
-
Copy full SHA for a68506f - Browse repository at this point
Copy the full SHA a68506fView commit details
Commits on Aug 27, 2024
-
Merge remote-tracking branch 'origin/main' into reduce-output
# Conflicts: # tests/entrypoints/openai/test_chat.py # vllm/engine/llm_engine.py # vllm/entrypoints/llm.py # vllm/entrypoints/openai/protocol.py # vllm/entrypoints/openai/serving_completion.py # vllm/sampling_params.py
Configuration menu - View commit details
-
Copy full SHA for cfe7118 - Browse repository at this point
Copy the full SHA cfe7118View commit details -
Configuration menu - View commit details
-
Copy full SHA for 45fd069 - Browse repository at this point
Copy the full SHA 45fd069View commit details
Commits on Sep 8, 2024
-
Merge remote-tracking branch 'origin/main' into reduce-output
# Conflicts: # vllm/engine/llm_engine.py # vllm/entrypoints/openai/serving_chat.py
Configuration menu - View commit details
-
Copy full SHA for 3f21ad6 - Browse repository at this point
Copy the full SHA 3f21ad6View commit details -
Merge remote-tracking branch 'origin/main' into reduce-output
# Conflicts: # vllm/engine/llm_engine.py
Configuration menu - View commit details
-
Copy full SHA for d59ffd1 - Browse repository at this point
Copy the full SHA d59ffd1View commit details
Commits on Sep 10, 2024
-
Configuration menu - View commit details
-
Copy full SHA for d2f36dd - Browse repository at this point
Copy the full SHA d2f36ddView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2843365 - Browse repository at this point
Copy the full SHA 2843365View commit details -
Merge remote-tracking branch 'origin/main' into reduce-output
# Conflicts: # vllm/entrypoints/llm.py
Configuration menu - View commit details
-
Copy full SHA for 2736ab1 - Browse repository at this point
Copy the full SHA 2736ab1View commit details -
Address Alex's comments, fix include_prompt logic
Also avoid appending delta token ids to sequences in cases they aren't needed.
Configuration menu - View commit details
-
Copy full SHA for a045dff - Browse repository at this point
Copy the full SHA a045dffView commit details
Commits on Sep 11, 2024
-
Merge remote-tracking branch 'origin/main' into reduce-output
# Conflicts: # vllm/sequence.py
Configuration menu - View commit details
-
Copy full SHA for 58f6112 - Browse repository at this point
Copy the full SHA 58f6112View commit details -
Configuration menu - View commit details
-
Copy full SHA for e7a2b55 - Browse repository at this point
Copy the full SHA e7a2b55View commit details
Commits on Sep 12, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 6b1f355 - Browse repository at this point
Copy the full SHA 6b1f355View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3233a92 - Browse repository at this point
Copy the full SHA 3233a92View commit details -
Merge remote-tracking branch 'origin/main' into reduce-output
# Conflicts: # tests/async_engine/test_async_llm_engine.py
Configuration menu - View commit details
-
Copy full SHA for f351ed2 - Browse repository at this point
Copy the full SHA f351ed2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 75814bd - Browse repository at this point
Copy the full SHA 75814bdView commit details