Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the openai benchmarking requests to work with latest OpenAI apis #2992

Merged
merged 2 commits into from
Mar 4, 2024

Conversation

wangchen615
Copy link
Contributor

Resolve #2940

To test the OpenAI benchmarking script, you can run:

python benchmark_serving.py --backend openai --base-url https://api.openai.com --endpoint /v1/chat/completions --num-prompts 1 --model gpt-3.5-turbo --tokenizer openai-community/gpt2 --dataset ShareGPT_V3_unfiltered_cleaned_split.json

The previous version will have issues to append text when using streaming API. Besides, the previous version does not support /v1/chat/completions apis.

@wangchen615 wangchen615 force-pushed the main branch 2 times, most recently from cfd2c42 to 03e2f6f Compare February 22, 2024 17:26
Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @wangchen615! Instead of modifying the existing async_request_openai_completions, could you add a separate request function for the v1/chat/completions endpoint?

Ideally we would like to keep separate request functions for each endpoint since the payload and the result parsing can be different.

@wangchen615 wangchen615 force-pushed the main branch 2 times, most recently from 739d39e to 3925602 Compare February 29, 2024 23:22
keep the openai backend url as /v1/completions and add openai-chat backend url as /v1/chat/completions

yapf format

add newline
@ywang96
Copy link
Member

ywang96 commented Mar 1, 2024

@wangchen615 Thank you for working on this! I will test it as well and get back to you.

Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this @wangchen615!

I've tested the new request function with vllm openai api server via the chat completions endpoint and it worked well.

Namespace(backend='openai-chat', base_url='http://mixtral-8-vllm.local', best_of=1, dataset='ShareGPT_V3_unfiltered_cleaned_split.json', disable_tqdm=False, endpoint='/v1/chat/completions', host='localhost', model='mistralai/Mixtral-8x7B-Instruct-v0.1', num_prompts=100, port=8000, request_rate=1.0, save_result=False, seed=0, tokenizer=None, trust_remote_code=False, use_beam_search=False, version='0.3.3')
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1.46k/1.46k [00:00<00:00, 330kB/s]
tokenizer.model: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 493k/493k [00:00<00:00, 1.74MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.80M/1.80M [00:00<00:00, 5.01MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 72.0/72.0 [00:00<00:00, 63.0kB/s]
  0%|                                                                                                                                         | 0/100 [00:00<?, ?it/s]Traffic request rate: 1.0
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:46<00:00,  1.07s/it]
Successful requests: 100
Benchmark duration: 106.682357 s
Total input tokens: 23521
Total generated tokens: 19726
Request throughput: 0.94 requests/s
Input token throughput: 220.48 tokens/s
Output token throughput: 184.90 tokens/s
Mean TTFT: 32.00 ms
Median TTFT: 18.36 ms
P99 TTFT: 93.73 ms
Mean TPOT: 41.45 ms
Median TPOT: 41.33 ms
P99 TPOT: 49.54 ms

Just left a nit on the assert error message but everything else looks good to me!

cc @simon-mo if you can approve and merge this.

Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
@wangchen615
Copy link
Contributor Author

Committed your suggestion. Thanks @ywang96

@simon-mo simon-mo merged commit 9a4548b into vllm-project:main Mar 4, 2024
22 checks passed
dtransposed pushed a commit to afeldman-nm/vllm that referenced this pull request Mar 26, 2024
…llm-project#2992)

Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Benchmarking script for openai chat completion api are not supported
3 participants