Fix the openai benchmarking requests to work with latest OpenAI apis #2992

wangchen615 · 2024-02-22T17:02:45Z

Resolve #2940

To test the OpenAI benchmarking script, you can run:

python benchmark_serving.py --backend openai --base-url https://api.openai.com --endpoint /v1/chat/completions --num-prompts 1 --model gpt-3.5-turbo --tokenizer openai-community/gpt2 --dataset ShareGPT_V3_unfiltered_cleaned_split.json

The previous version will have issues to append text when using streaming API. Besides, the previous version does not support /v1/chat/completions apis.

ywang96

Thanks for the contribution @wangchen615! Instead of modifying the existing async_request_openai_completions, could you add a separate request function for the v1/chat/completions endpoint?

Ideally we would like to keep separate request functions for each endpoint since the payload and the result parsing can be different.

benchmarks/backend_request_func.py

keep the openai backend url as /v1/completions and add openai-chat backend url as /v1/chat/completions yapf format add newline

ywang96 · 2024-03-01T16:10:21Z

@wangchen615 Thank you for working on this! I will test it as well and get back to you.

ywang96

Thanks for adding this @wangchen615!

I've tested the new request function with vllm openai api server via the chat completions endpoint and it worked well.

Namespace(backend='openai-chat', base_url='http://mixtral-8-vllm.local', best_of=1, dataset='ShareGPT_V3_unfiltered_cleaned_split.json', disable_tqdm=False, endpoint='/v1/chat/completions', host='localhost', model='mistralai/Mixtral-8x7B-Instruct-v0.1', num_prompts=100, port=8000, request_rate=1.0, save_result=False, seed=0, tokenizer=None, trust_remote_code=False, use_beam_search=False, version='0.3.3')
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1.46k/1.46k [00:00<00:00, 330kB/s]
tokenizer.model: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 493k/493k [00:00<00:00, 1.74MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.80M/1.80M [00:00<00:00, 5.01MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 72.0/72.0 [00:00<00:00, 63.0kB/s]
  0%|                                                                                                                                         | 0/100 [00:00<?, ?it/s]Traffic request rate: 1.0
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:46<00:00,  1.07s/it]
Successful requests: 100
Benchmark duration: 106.682357 s
Total input tokens: 23521
Total generated tokens: 19726
Request throughput: 0.94 requests/s
Input token throughput: 220.48 tokens/s
Output token throughput: 184.90 tokens/s
Mean TTFT: 32.00 ms
Median TTFT: 18.36 ms
P99 TTFT: 93.73 ms
Mean TPOT: 41.45 ms
Median TPOT: 41.33 ms
P99 TPOT: 49.54 ms

Just left a nit on the assert error message but everything else looks good to me!

cc @simon-mo if you can approve and merge this.

benchmarks/backend_request_func.py

Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>

wangchen615 · 2024-03-04T14:43:00Z

Committed your suggestion. Thanks @ywang96

…llm-project#2992) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>

wangchen615 force-pushed the main branch 2 times, most recently from cfd2c42 to 03e2f6f Compare February 22, 2024 17:26

ywang96 reviewed Feb 27, 2024

View reviewed changes

benchmarks/backend_request_func.py Outdated Show resolved Hide resolved

wangchen615 force-pushed the main branch 2 times, most recently from 739d39e to 3925602 Compare February 29, 2024 23:22

fix issue vllm-project#2940

3925602

keep the openai backend url as /v1/completions and add openai-chat backend url as /v1/chat/completions yapf format add newline

wangchen615 mentioned this pull request Feb 29, 2024

vLLM benchmarking script do not support the openAI compatible API server. #2993

Closed

ywang96 approved these changes Mar 3, 2024

View reviewed changes

benchmarks/backend_request_func.py Outdated Show resolved Hide resolved

Update benchmarks/backend_request_func.py

8b6ae43

Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>

simon-mo approved these changes Mar 4, 2024

View reviewed changes

simon-mo merged commit 9a4548b into vllm-project:main Mar 4, 2024
22 checks passed

ywang96 mentioned this pull request Mar 6, 2024

Error in benchmark model with vllm backend #3230

Closed

dtransposed pushed a commit to afeldman-nm/vllm that referenced this pull request Mar 26, 2024

Fix the openai benchmarking requests to work with latest OpenAI apis (v…

461ec20

…llm-project#2992) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the openai benchmarking requests to work with latest OpenAI apis #2992

Fix the openai benchmarking requests to work with latest OpenAI apis #2992

wangchen615 commented Feb 22, 2024

ywang96 left a comment •

edited

Loading

ywang96 commented Mar 1, 2024

ywang96 left a comment

wangchen615 commented Mar 4, 2024

Fix the openai benchmarking requests to work with latest OpenAI apis #2992

Fix the openai benchmarking requests to work with latest OpenAI apis #2992

Conversation

wangchen615 commented Feb 22, 2024

ywang96 left a comment • edited Loading

Choose a reason for hiding this comment

ywang96 commented Mar 1, 2024

ywang96 left a comment

Choose a reason for hiding this comment

wangchen615 commented Mar 4, 2024

ywang96 left a comment •

edited

Loading