-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: RuntimeError: CUDA error: an illegal memory access was encountered #8025
Comments
Could you share the access pattern you are using? E.g. the client script that generates the issue? This would really help us to reproduce and solve |
I use the benchmark scripts in v0.4.0, and set request_rate=6 for this deployment. do you mean the token I sent to the model? |
I get a similar one. Parameters used:
│ await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) │
|
same error in 0.6.1 |
+1 0.6.2 GPU:L4 |
+1 0.6.2 GPU: A800 model: awq |
+1 2x4090 on awq |
+1 8xA100, Azure. 0.6.3.post1 Mistral Nemo, Codestral, Small 22B |
seems fixed in latter version, from v0.6.3 |
Still facing this issue with Llama 3.3 70B |
Your current environment
VLLM image: v0.5.4
hardware: RTX4090
gpu driver: 550.78
model: qwen1.5-14b-chat-awq
launch cmd: enable-prefix-caching
🐛 Describe the bug
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: