[Misc][Benchmarking] Enable benchmarks to create request from file #3530

ElizaWszola · 2024-03-20T14:02:33Z

Enable benchmarks that read a text file and send repetitive requests of it to the server. This way of benchmarking allows to obtain best case / upper bound scenario for request caching.

This PR was split from #3431 and used to create Sonnet dataset results.

To create requests from a text file, use arguments --dataset path/to/dataset --request-from-text. This will make the benchmark repeatedly send the text file to the server, resulting in best case caching scenario. Example:

python benchmarks/benchmark_serving.py  --model huggyllama/llama-7b --dataset benchmarks/data/sonnet.txt --request-rate 2.5 --num-prompts 1000 --backend openai --endpoint /v1/completions --request-from-text

ElizaWszola · 2024-03-20T14:04:57Z

benchmarks/benchmark_serving.py

+            per_token_latencies.append(
+                (outputs[i].latency - outputs[i].ttft) / output_len)


Does this change make sense?

Yep! In #3277 what I did was

if output_len > 1: tpots.append((outputs[i].latency - outputs[i].ttft) / (output_len - 1))

ElizaWszola · 2024-03-20T14:07:41Z

I've been also thinking of making --dataset in --request-from-text point to a directory from which files would be randomly picked and sent as requests, but I don't know if there is a need for this kind of solution. The Sonnet test could then be reproduced by creating directory with a single sonnet.txt file in it.

ywang96 · 2024-03-20T16:44:38Z

Hey @ElizaWszola! Thank you for the PR and we actually shared a lot of similar ideas! Some changes in this PR have been addressed (more accurate TPOT calculation, --dataset arg, etc) in an early PR of mine #3277 - could you please take a look at it so we can consolidate?

I also personally really like the idea of the data registry from #3431, but we can probably discuss it later when we really need it.

ywang96 · 2024-03-20T19:45:34Z

benchmarks/benchmark_serving.py

+    system_message = {
+        "content": "You are a chatbot with the explicit goal of "
+        "helping the user as best as possible",
+        "role": "system",
+    }


FYI this actually won't work with Mixtral since it doesn't have a "system" role.

ywang96 · 2024-03-20T19:56:15Z

benchmarks/text_datasets.py

+            break
+
+    # Sample num_requests from the list.
+    assert dataset_args.num_samples <= len(dataset)


In this case it'll be not possible to run certain long-duration or high request-rate sessions if the dataset is too small, but we probably don't want to modify the dataset itself either.

Perhaps we should do sampling with replacements? (i.e. random.choices())

ElizaWszola · 2024-03-21T06:34:16Z

@ywang96 Thanks for pointing this out! The benchmarking changes did travel from your PR to the ordered dict evictor testing to this one and I somehow missed it. I think it makes more sense to close this PR and work on yours, what do you think?

ywang96 · 2024-03-21T06:36:00Z

@ywang96 Thanks for pointing this out! The benchmarking changes did travel from your PR to the ordered dict evictor testing to this one and I somehow missed it. I think it makes more sense to close this PR and work on yours, what do you think?

Yep of course! Feel free to review my PR, thanks!

ElizaWszola · 2024-03-21T06:38:22Z

@ywang96 Alright, I'm closing this now. Sorry for confusion on my side!

Enable benchmarks to create request from file

f687bf5

ElizaWszola commented Mar 20, 2024

View reviewed changes

format

84e7317

simon-mo assigned ywang96 Mar 20, 2024

ywang96 reviewed Mar 20, 2024

View reviewed changes

ElizaWszola closed this Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc][Benchmarking] Enable benchmarks to create request from file #3530

[Misc][Benchmarking] Enable benchmarks to create request from file #3530

ElizaWszola commented Mar 20, 2024 •

edited

Loading

ElizaWszola Mar 20, 2024

ywang96 Mar 20, 2024 •

edited

Loading

ElizaWszola commented Mar 20, 2024

ywang96 commented Mar 20, 2024

ywang96 Mar 20, 2024

ywang96 Mar 20, 2024

ElizaWszola commented Mar 21, 2024

ywang96 commented Mar 21, 2024

ElizaWszola commented Mar 21, 2024 •

edited

Loading

		per_token_latencies.append(
		(outputs[i].latency - outputs[i].ttft) / output_len)

[Misc][Benchmarking] Enable benchmarks to create request from file #3530

[Misc][Benchmarking] Enable benchmarks to create request from file #3530

Conversation

ElizaWszola commented Mar 20, 2024 • edited Loading

ElizaWszola Mar 20, 2024

Choose a reason for hiding this comment

ywang96 Mar 20, 2024 • edited Loading

Choose a reason for hiding this comment

ElizaWszola commented Mar 20, 2024

ywang96 commented Mar 20, 2024

ywang96 Mar 20, 2024

Choose a reason for hiding this comment

ywang96 Mar 20, 2024

Choose a reason for hiding this comment

ElizaWszola commented Mar 21, 2024

ywang96 commented Mar 21, 2024

ElizaWszola commented Mar 21, 2024 • edited Loading

ElizaWszola commented Mar 20, 2024 •

edited

Loading

ywang96 Mar 20, 2024 •

edited

Loading

ElizaWszola commented Mar 21, 2024 •

edited

Loading