Turn shareGPT data into a standard benchmark #45

zhuohan123 · 2023-04-22T03:52:50Z

Extract out the lengths of the conversation rounds, and maybe have that data directly available from github.
The current L-shape evaluation with binary search for throughput is hard to run and not scalable. We should find an easier way to benchmark the performance.

…t#45) Tested by checking the help message in openai server: ``` python -m vllm.entrypoints.openai.api_server --help ``` Before: ``` --sparsity {sparse_w16a16,None}, -s {sparse_w16a16,None} Method used to compress sparse weights. If None, we first check the `sparsity_config` attribute in the model config file. If that is None we assume the model weights are dense ``` After: ``` --sparsity {None,sparse_w16a16,semi_structured_sparse_w16a16}, -s {None,sparse_w16a16,semi_structured_sparse_w16a16} Method used to compress sparse weights. If None, we first check the `sparsity_config` attribute in the model config file. If that is None we assume the model weights are dense ```

WoosukKwon assigned zhuohan123 May 6, 2023

WoosukKwon mentioned this issue Jun 11, 2023

Add script for benchmarking serving throughput #145

Merged

WoosukKwon closed this as completed in #145 Jun 15, 2023

shanshanpt mentioned this issue Nov 17, 2023

Run long conetxt error : CUDA error: an illegal memory access was encountered #1700

Closed

junior-zsy mentioned this issue Nov 20, 2023

Error with 32k Long Text in chatglm2-6b-32k Model #1725

Closed

tianyil1 pushed a commit to tianyil1/vllm that referenced this issue Jun 5, 2024

WA: Remove pyproject.toml, bypass HPU autodetection (vllm-project#45)

8359489

fxmarty pushed a commit to fxmarty/vllm-public that referenced this issue Jun 12, 2024

Include benchmark scripts in container (vllm-project#45)

95b3acc

ZHJ19970917 mentioned this issue Jul 14, 2024

[Bug]: When using qwen-32b-chat-awq with multi-threaded access, errors occur after approximately several hundred visits.”vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.“ #6421

Closed

qniguogym mentioned this issue Jul 29, 2024

[Kernel][RFC] Refactor the punica kernel based on Triton #5036

Merged

3 tasks

alixiaodi mentioned this issue Aug 2, 2024

[Bug]: #7072

Closed

jikunshang pushed a commit to jikunshang/vllm that referenced this issue Aug 19, 2024

update docker file with dependency installation (vllm-project#45)

e9e3a5c

jikunshang pushed a commit to jikunshang/vllm that referenced this issue Sep 11, 2024

update docker file with dependency installation (vllm-project#45)

9ff9708

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Turn shareGPT data into a standard benchmark #45

Turn shareGPT data into a standard benchmark #45

zhuohan123 commented Apr 22, 2023

Turn shareGPT data into a standard benchmark #45

Turn shareGPT data into a standard benchmark #45

Comments

zhuohan123 commented Apr 22, 2023