Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expert Parallelism (EP) Support for DeepSeek V3/R1 #3602

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

sleepcoo
Copy link
Contributor

@sleepcoo sleepcoo commented Feb 16, 2025

Motivation

Expert Parallelism (EP) Support for DeepSeek V3/R1。

Modifications

  • the group GEMM operator supports FP8
  • supports DeepSeek V3 parameter loading.

Performence

The performance improved by approximately 5% on a single H200 machine.

H200*8 Input token throughput (tok/s) Output token throughput (tok/s)
EP=8 677.82 1468.48
TP=8 647.60 1403.02

test command

python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3 --tp 8 --trust-remote-code 
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3 --tp 8 --trust-remote-code --enable-ep-moe 
 python3 -m sglang.bench_serving --backend sglang --dataset-name random --num-prompts 600 --random-input 180 --random-output 400 --request-rate 40

@ispobock ispobock changed the title Expert Parallelism (EP) Support for DeepSeek V2 Expert Parallelism (EP) Support for DeepSeek V3 Feb 17, 2025
laixinn and others added 8 commits February 18, 2025 14:23
Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
@sleepcoo sleepcoo marked this pull request as ready for review February 19, 2025 11:54
@zhyncs zhyncs mentioned this pull request Feb 19, 2025
18 tasks
@sleepcoo sleepcoo changed the title Expert Parallelism (EP) Support for DeepSeek V3 Expert Parallelism (EP) Support for DeepSeek V3/R1 Feb 20, 2025
@xinji1
Copy link

xinji1 commented Feb 25, 2025

just tested this pr on a single mi300X and found regression (output_throughput/e2e latency/TTFT). Anybody test it under an AMD enviroment?

@zhyncs
Copy link
Member

zhyncs commented Feb 25, 2025

just tested this pr on a single mi300X and found regression (output_throughput/e2e latency/TTFT). Anybody test it under an AMD enviroment?

It is currently not enabled by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants