-
-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the openai benchmarking requests to work with latest OpenAI apis #2992
Conversation
cfd2c42
to
03e2f6f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @wangchen615! Instead of modifying the existing async_request_openai_completions
, could you add a separate request function for the v1/chat/completions
endpoint?
Ideally we would like to keep separate request functions for each endpoint since the payload and the result parsing can be different.
739d39e
to
3925602
Compare
keep the openai backend url as /v1/completions and add openai-chat backend url as /v1/chat/completions yapf format add newline
@wangchen615 Thank you for working on this! I will test it as well and get back to you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this @wangchen615!
I've tested the new request function with vllm openai api server via the chat completions endpoint and it worked well.
Namespace(backend='openai-chat', base_url='http://mixtral-8-vllm.local', best_of=1, dataset='ShareGPT_V3_unfiltered_cleaned_split.json', disable_tqdm=False, endpoint='/v1/chat/completions', host='localhost', model='mistralai/Mixtral-8x7B-Instruct-v0.1', num_prompts=100, port=8000, request_rate=1.0, save_result=False, seed=0, tokenizer=None, trust_remote_code=False, use_beam_search=False, version='0.3.3')
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1.46k/1.46k [00:00<00:00, 330kB/s]
tokenizer.model: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 493k/493k [00:00<00:00, 1.74MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.80M/1.80M [00:00<00:00, 5.01MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 72.0/72.0 [00:00<00:00, 63.0kB/s]
0%| | 0/100 [00:00<?, ?it/s]Traffic request rate: 1.0
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:46<00:00, 1.07s/it]
Successful requests: 100
Benchmark duration: 106.682357 s
Total input tokens: 23521
Total generated tokens: 19726
Request throughput: 0.94 requests/s
Input token throughput: 220.48 tokens/s
Output token throughput: 184.90 tokens/s
Mean TTFT: 32.00 ms
Median TTFT: 18.36 ms
P99 TTFT: 93.73 ms
Mean TPOT: 41.45 ms
Median TPOT: 41.33 ms
P99 TPOT: 49.54 ms
Just left a nit on the assert error message but everything else looks good to me!
cc @simon-mo if you can approve and merge this.
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Committed your suggestion. Thanks @ywang96 |
…llm-project#2992) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Resolve #2940
To test the OpenAI benchmarking script, you can run:
The previous version will have issues to append text when using streaming API. Besides, the previous version does not support
/v1/chat/completions
apis.