[Kernel] Enhance MoE benchmarking & tuning script #4921

WoosukKwon · 2024-05-20T17:59:23Z

This PR is to enhance the MoE tuning & benchmarking script which is a bit hacky at the moment. Also, the PR enables using multiple GPUs for benchmarking via Ray.

WoosukKwon · 2024-05-20T18:19:15Z

@pcmoritz The PR is not ready now. I will ping you once it's ready.

pcmoritz · 2024-05-20T18:30:27Z

Sounds great, thank you :)

benchmarks/kernels/benchmark_moe.py

WoosukKwon · 2024-06-03T18:35:59Z

@pcmoritz This PR is ready now. Sorry for the delay.

pcmoritz · 2024-06-03T20:00:45Z

One small gotcha I was running into while trying this out is that currently fp8 can't be benchmarked with an FP16 checkpoint, e.g.

python benchmark_moe.py --dtype fp8

errors out since mistralai/Mixtral-8x7B-Instruct-v0.1 is FP16. I think what we should do here is

diff --git a/benchmarks/kernels/benchmark_moe.py b/benchmarks/kernels/benchmark_moe.py
index 6796ea401..3f3005e20 100644
--- a/benchmarks/kernels/benchmark_moe.py
+++ b/benchmarks/kernels/benchmark_moe.py
@@ -46,6 +46,8 @@ def benchmark_config(
         w2_scale = torch.randn(num_experts, dtype=torch.float32)
         a1_scale = torch.randn(1, dtype=torch.float32)
         a2_scale = torch.randn(1, dtype=torch.float32)
+        w1 = w1.to(torch.float8_e4m3fn)
+        w2 = w2.to(torch.float8_e4m3fn)
 
     input_gating = torch.empty(num_tokens, num_experts, dtype=torch.float32)

since FP8 checkpoints are not widely available yet and also for vLLM FP8 we support running FP16 checkpoints in FP8 :)

benchmarks/kernels/benchmark_moe.py

WoosukKwon · 2024-06-04T02:47:06Z

@pcmoritz I addressed your comments. PTAL.

pcmoritz

Thanks! I've been using the new script to do some tuning for FP8 and it works like a charm, thanks a lot for improving it -- I'll open a PR with the new configs shortly after I have tested the configs!

Btw, in order to get progress bars, I've been using this modification:

from ray.experimental.tqdm_ray import tqdm

and then where we iterate over the configs:

for config in tqdm(search_space):

This will make sure to print progress bars without messing up stdout and it works like this: https://docs.ray.io/en/latest/ray-observability/user-guides/configure-logging.html#distributed-progress-bars-tqdm

Feel free to add it (don't worry it is currently in the experimental namespace -- I think it is one of the APIs that should be stabilized and I'll look into that).

WoosukKwon · 2024-06-04T03:01:22Z

@pcmoritz Ray tqdm is really cool! I actually wanted to have exactly the same feature. Happy to add that!

Tune Qwen2-57B-A14B configs based on #4921 Throughput Performance command: python benchmarks/benchmark_throughput.py --model=Qwen/Qwen2-57B-A14B-Instruct --input-len 1000 --output-len 50 -tp 2 A100 GPU benchmark no config w/ PR tp=2 10.53 requests/s, 11058.17 tokens/s 12.47 requests/s, 13088.57 tokens/s tp=4 17.77 requests/s, 18662.95 tokens/s 20.20 requests/s, 21212.32 tokens/s

…#5497) Tune Qwen2-57B-A14B configs based on vllm-project#4921 Throughput Performance command: python benchmarks/benchmark_throughput.py --model=Qwen/Qwen2-57B-A14B-Instruct --input-len 1000 --output-len 50 -tp 2 A100 GPU benchmark no config w/ PR tp=2 10.53 requests/s, 11058.17 tokens/s 12.47 requests/s, 13088.57 tokens/s tp=4 17.77 requests/s, 18662.95 tokens/s 20.20 requests/s, 21212.32 tokens/s

WoosukKwon added 2 commits May 20, 2024 17:54

[Kernel] Improve benchmark_moe script

e00d01d

Delete benchmark_mixtral_moe

717e8c7

pcmoritz self-assigned this May 20, 2024

WoosukKwon added 10 commits May 21, 2024 01:56

Minor fixes

1bbed44

Fix

1324af4

Minor

66f418a

Merge branch 'main' into bench-moe

5c99cd6

Merge branch 'main' into bench-moe

419468b

Update

9088e17

Update

2e42872

Add get_default_configs

e5e46e8

Minor

005b07c

Fix search space

2b05e50

WoosukKwon marked this pull request as ready for review June 3, 2024 18:22

youkaichao reviewed Jun 3, 2024

View reviewed changes

benchmarks/kernels/benchmark_moe.py Show resolved Hide resolved

Minor

6f472bb

pcmoritz reviewed Jun 3, 2024

View reviewed changes

benchmarks/kernels/benchmark_moe.py Outdated Show resolved Hide resolved

pcmoritz reviewed Jun 3, 2024

View reviewed changes

benchmarks/kernels/benchmark_moe.py Outdated Show resolved Hide resolved

Address comments

fe1c026

WoosukKwon requested a review from pcmoritz June 4, 2024 02:47

pcmoritz approved these changes Jun 4, 2024

View reviewed changes

WoosukKwon added 2 commits June 3, 2024 19:59

Add Ray tqdm

04c3d11

isort

f78a73b

Merge branch 'main' into bench-moe

ff474db

WoosukKwon merged commit 3a434b0 into main Jun 4, 2024
18 of 23 checks passed

WoosukKwon deleted the bench-moe branch June 4, 2024 03:07

pcmoritz mentioned this pull request Jun 4, 2024

[Kernel] Re-tune Mixtral MoE configurations for FP8 on H100 #5238

Merged

WoosukKwon mentioned this pull request Jun 4, 2024

[Kernel] Add back batch size 1536 and 3072 to MoE tuning #5242

Merged

blinkbear pushed a commit to blinkbear/vllm that referenced this pull request Jun 6, 2024

[Kernel] Enhance MoE benchmarking & tuning script (vllm-project#4921)

a4f2d70

robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Jun 11, 2024

[Kernel] Enhance MoE benchmarking & tuning script (vllm-project#4921)

1d88071

wenyujin333 mentioned this pull request Jun 13, 2024

[Kernel] Tune Qwen2MoE kernel configurations with tp2,4 #5497

Merged

joerunde pushed a commit to joerunde/vllm that referenced this pull request Jun 17, 2024

[Kernel] Enhance MoE benchmarking & tuning script (vllm-project#4921)

d74f5fb

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 27, 2024

[Kernel] Enhance MoE benchmarking & tuning script (vllm-project#4921)

d5ac5f1

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 8, 2024

[Kernel] Enhance MoE benchmarking & tuning script (vllm-project#4921)

dae8010

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[Kernel] Enhance MoE benchmarking & tuning script (vllm-project#4921)

cb18cf9

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

[Kernel] Enhance MoE benchmarking & tuning script (vllm-project#4921)

0b5547b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kernel] Enhance MoE benchmarking & tuning script #4921

[Kernel] Enhance MoE benchmarking & tuning script #4921

WoosukKwon commented May 20, 2024 •

edited

Loading

WoosukKwon commented May 20, 2024

pcmoritz commented May 20, 2024

WoosukKwon commented Jun 3, 2024

pcmoritz commented Jun 3, 2024

WoosukKwon commented Jun 4, 2024

pcmoritz left a comment •

edited

Loading

WoosukKwon commented Jun 4, 2024

[Kernel] Enhance MoE benchmarking & tuning script #4921

[Kernel] Enhance MoE benchmarking & tuning script #4921

Conversation

WoosukKwon commented May 20, 2024 • edited Loading

WoosukKwon commented May 20, 2024

pcmoritz commented May 20, 2024

WoosukKwon commented Jun 3, 2024

pcmoritz commented Jun 3, 2024

WoosukKwon commented Jun 4, 2024

pcmoritz left a comment • edited Loading

Choose a reason for hiding this comment

WoosukKwon commented Jun 4, 2024

WoosukKwon commented May 20, 2024 •

edited

Loading

pcmoritz left a comment •

edited

Loading