Temporary PR to add to the next PR #4

gsolard · 2024-07-03T15:10:06Z

No description provided.

maxDavid40 · 2024-07-10T09:56:17Z

New behavior to vllm (0.5.0.post1) completions endpoints (same for chat/completions) !

Usage info not return by default, need to add on the body request the stream_options informations: "stream_options" : {"include_usage": True}

Also, we need to modify benchmark_llm_serving.query_profiles.query_functions.query_functions

  if len(json_chunk['choices']) > 0:
    data = json_chunk['choices'][0]['text']
    output.generated_text += data
  if "usage" in json_chunk:
    if json_chunk['usage'] is not None:
      output.prompt_length = json_chunk['usage']['prompt_tokens']

⚠️ Be careful, the attribute stream_options not implemented in 0.4.3 in openAi_server.protocols, the response will be a 422 if you add in body the stream_options key

gsolard · 2024-08-01T13:54:30Z

Changes done in PR #5

Remove a useless line

4178a2b

gsolard closed this Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Temporary PR to add to the next PR #4

Temporary PR to add to the next PR #4

gsolard commented Jul 3, 2024

maxDavid40 commented Jul 10, 2024

gsolard commented Aug 1, 2024

Temporary PR to add to the next PR #4

Temporary PR to add to the next PR #4

Conversation

gsolard commented Jul 3, 2024

maxDavid40 commented Jul 10, 2024

gsolard commented Aug 1, 2024