Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Track] DeepSeek V3/R1 nextn progress #3472

Open
10 of 13 tasks
zhyncs opened this issue Feb 10, 2025 · 5 comments
Open
10 of 13 tasks

[Track] DeepSeek V3/R1 nextn progress #3472

zhyncs opened this issue Feb 10, 2025 · 5 comments

Comments

@zhyncs
Copy link
Member

zhyncs commented Feb 10, 2025

Triton Backend

@ispobock @pankajroark

FlashInfer Backend

@zhyncs @yzh119

  • compatible with disable MLA

  • support FlashInfer nightly MLA ragged prefill and CUDA Core MLA decoding

  • support FlashInfer v0.2.0.post3 MLA ragged, paged prefill and decoding (@zhyncs @yzh119 )

  • nextn parts can be shared with Triton Backend

EAGLE 2

@zhyncs @Ying1123

@zhyncs
Copy link
Member Author

zhyncs commented Feb 17, 2025

ref
MTP support: #3582
v0.4.3.post1 release: #3638

SGLang supports MTP (nextn) in the Triton backend, achieving a speed of 77 tokens/s, twice as fast as other OSS LLM engines.

@panpan0000
Copy link
Contributor

Woo, Thank you @zhyncs.
just try new image lmsysorg/sglang:v0.4.3.post2-cu125
the performance seems similar than 0.4.2 (on 16 x H20)
when running-req = 1, the gen throughput (token/s) is no more than previous.

What did I missed ?

@lambert0312
Copy link

I see compatible with radix cache and chunked prefill. How is it going?
Long context scenarios require this feature. @zhyncs

@yukavio
Copy link
Collaborator

yukavio commented Feb 21, 2025

The current Eagle has two issues:

  1. It does not support chunked prefill.
  2. The draft model follows the same distributed strategy as the target model.

Does the community have any plans to address these two issues?

@zhyncs
Copy link
Member Author

zhyncs commented Feb 21, 2025

@yukavio chunked prefill support is on the way @merrymercy

@zhyncs zhyncs unpinned this issue Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants