-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Error: Failed to initialize the TMA descriptor 700 for LLaMa 3.1 405B on 8*H100 -- prefill error? #6870
Comments
Thanks for reporting this. We have resolved the issue with: This will be in the next release of vllm (ideally this week). you can use the nightlies to unblock yourself for now |
@pseudotensor just to confirm did building from source:main now work for you? I'm running into the same error at runtime with pretty much the same setup as yours. |
Yes I built from source a docker image about 4 days ago. Seems like I used 3eeb148 |
We are seeing same error when using Llama3.1-70B-Instruct model, am I correct in assuming that it will be fixed for the 70B model also? |
@robertgshaw2-neuralmagic We are encountering the same issue when serving Llama-3.1-70B-Instruct-FP8 with 2xH100. I can reproduce it consistently when num of concurrent request goes up to 256, with all engine arguments as default except tensor parallel size to 2. Do you think it could possibly be an edge case even after the fix for 405B? |
We are also having this issue with Qwen-32B-Instruct-FP8 Error: Failed to initialize the TMA descriptor 700 |
What version of vllm are you running? |
As for vllm version, we are using 0.6.2 and 0.6.3, and we're having the same issue with both versions. Thanks. |
Can you share reproduction instructions? |
Hi, Thanks. Please see the attached file for the error log. |
@robertgshaw2-neuralmagic This is the command that I use to spin up the vLLM server |
Your current environment
latest docker image
🐛 Describe the bug
Complete logs
llama31-405b.log.zip
e.g.
The text was updated successfully, but these errors were encountered: