Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix draft cuda graph capture failure #3431

Merged
merged 1 commit into from
Feb 9, 2025
Merged

fix draft cuda graph capture failure #3431

merged 1 commit into from
Feb 9, 2025

Conversation

zhyncs
Copy link
Member

@zhyncs zhyncs commented Feb 9, 2025

Motivation

fix Capture cuda graph failed: mat1 and mat2 must have the same dtype, but got Float and Half

python3 -m sglang.launch_server --model meta-llama/Meta-Llama-3-8B-Instruct  \
--speculative-algo EAGLE --speculative-draft lmzheng/sglang-EAGLE-LLaMA3-Instruct-8B  \
--speculative-num-steps 5 --speculative-eagle-topk 8 --speculative-num-draft-tokens 64 \
--disable-radix-cache --mem-fraction 0.7 --cuda-graph-max-bs 32 --dtype bfloat16

https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/blob/main/config.json#L23
https://huggingface.co/lmzheng/sglang-EAGLE-LLaMA3-Instruct-8B/blob/main/config.json#L19

We should explicitly specify --dtype when the target model and draft model use different dtype (such as bfloat16 and float16).

Modifications

Checklist

  • Format your code according to the Code Formatting with Pre-Commit.
  • Add unit tests as outlined in the Running Unit Tests.
  • Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
  • Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
  • For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.

@zhyncs zhyncs merged commit 1646149 into main Feb 9, 2025
2 of 8 checks passed
@zhyncs zhyncs deleted the zhyncs/type branch February 9, 2025 15:16
@zhyncs
Copy link
Member Author

zhyncs commented Feb 9, 2025

fix #3395

chongli-uw pushed a commit to chongli-uw/sglang that referenced this pull request Feb 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant