Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Add engine option to return only deltas or final output #7381

Merged
merged 22 commits into from
Sep 12, 2024

Commits on Aug 13, 2024

  1. [Core] Add engine option to return only deltas or final output

    The LLMEngine and AsyncLLMEngine APIs will currently return/stream cumulative outputs for all sequences at every step.
    
    This is more data than needed for LLM.generate or the OpenAI server APIs:
    - For LLM.generate and non-streaming APIs we only need the final output
    - For streaming APIs we only require deltas
    
    This PR adds an `output_kind` parameter to SamplingParams with an enum value of either CUMULATIVE, DELTA, or FINAL_ONLY.
    
    It will reduce the number of objects that need to be constructed at each step, and the amount of data to be serialized to return to the newly-decoupled front-end API process.
    njhill committed Aug 13, 2024
    Configuration menu
    Copy the full SHA
    1eb9991 View commit details
    Browse the repository at this point in the history
  2. Fixes

    njhill committed Aug 13, 2024
    Configuration menu
    Copy the full SHA
    9bc3fdd View commit details
    Browse the repository at this point in the history
  3. Fix ignored sequence case

    njhill committed Aug 13, 2024
    Configuration menu
    Copy the full SHA
    ef2e59f View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    dc1f3f2 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    9d35a00 View commit details
    Browse the repository at this point in the history

Commits on Aug 14, 2024

  1. Configuration menu
    Copy the full SHA
    b7ff44e View commit details
    Browse the repository at this point in the history

Commits on Aug 15, 2024

  1. Make tests more robust

    njhill committed Aug 15, 2024
    Configuration menu
    Copy the full SHA
    34df9bd View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a68506f View commit details
    Browse the repository at this point in the history

Commits on Aug 27, 2024

  1. Merge remote-tracking branch 'origin/main' into reduce-output

    # Conflicts:
    #	tests/entrypoints/openai/test_chat.py
    #	vllm/engine/llm_engine.py
    #	vllm/entrypoints/llm.py
    #	vllm/entrypoints/openai/protocol.py
    #	vllm/entrypoints/openai/serving_completion.py
    #	vllm/sampling_params.py
    njhill committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    cfe7118 View commit details
    Browse the repository at this point in the history
  2. Post-merge wip

    njhill committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    45fd069 View commit details
    Browse the repository at this point in the history

Commits on Sep 8, 2024

  1. Merge remote-tracking branch 'origin/main' into reduce-output

    # Conflicts:
    #	vllm/engine/llm_engine.py
    #	vllm/entrypoints/openai/serving_chat.py
    njhill committed Sep 8, 2024
    Configuration menu
    Copy the full SHA
    3f21ad6 View commit details
    Browse the repository at this point in the history
  2. Merge remote-tracking branch 'origin/main' into reduce-output

    # Conflicts:
    #	vllm/engine/llm_engine.py
    njhill committed Sep 8, 2024
    Configuration menu
    Copy the full SHA
    d59ffd1 View commit details
    Browse the repository at this point in the history

Commits on Sep 10, 2024

  1. Configuration menu
    Copy the full SHA
    d2f36dd View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2843365 View commit details
    Browse the repository at this point in the history
  3. Merge remote-tracking branch 'origin/main' into reduce-output

    # Conflicts:
    #	vllm/entrypoints/llm.py
    njhill committed Sep 10, 2024
    Configuration menu
    Copy the full SHA
    2736ab1 View commit details
    Browse the repository at this point in the history
  4. Address Alex's comments, fix include_prompt logic

    Also avoid appending delta token ids to sequences in cases they aren't needed.
    njhill committed Sep 10, 2024
    Configuration menu
    Copy the full SHA
    a045dff View commit details
    Browse the repository at this point in the history

Commits on Sep 11, 2024

  1. Merge remote-tracking branch 'origin/main' into reduce-output

    # Conflicts:
    #	vllm/sequence.py
    njhill committed Sep 11, 2024
    Configuration menu
    Copy the full SHA
    58f6112 View commit details
    Browse the repository at this point in the history
  2. Add tests

    njhill committed Sep 11, 2024
    Configuration menu
    Copy the full SHA
    e7a2b55 View commit details
    Browse the repository at this point in the history

Commits on Sep 12, 2024

  1. Some rework/simplification

    njhill committed Sep 12, 2024
    Configuration menu
    Copy the full SHA
    6b1f355 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    3233a92 View commit details
    Browse the repository at this point in the history
  3. Merge remote-tracking branch 'origin/main' into reduce-output

    # Conflicts:
    #	tests/async_engine/test_async_llm_engine.py
    njhill committed Sep 12, 2024
    Configuration menu
    Copy the full SHA
    f351ed2 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    75814bd View commit details
    Browse the repository at this point in the history