-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: distributed_executor_backend=mp
does not work with GPTQ tp>1
#6004
Comments
distributed_executor_backend=mp
does not work with quantizationdistributed_executor_backend=mp
does not work with GPTQ tp>1
It would be great if we can better control when cuda is initialized, not sure whether that's feasible though. Despite @youkaichao 's fix to the distributed tests, it's quite a headache to ensure that CUDA is not accidentally initialized wrongly during those tests. |
For sure - Im going to look into this when I get some time. Its specific to GPTQ (does not happen for fp8 quantization). I think the source is that we check cuda_device_capability when deciding if we can convert GPTQ-->Marlin. I think this happens "too early" in the lifecycle Will look into a workaround when I get some time. I have a couple PRs I want to wrap up before I look into this |
met the same issue - happy to poke around / peek if folks are busy with higher priority tasks |
Feel free. I think this function is the culprit --- initializes torch:
It is called by this function:
Which is called by this function:
I think solutions are:
|
we can use pynvml to check the compute capability, without calling |
Your current environment
🐛 Describe the bug
distributed_executor_backed="mp"
is now enabled by default for vLLM. However, this feature is currently incompatible with some GPTQ quantization for tp>1 due to the order in which torch is initialized. We get the classicRuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Setting
distributed_backend_executor="ray"
works for GPTQThe following fails:
with:
Workarounds for now:
distributed_executor_backend="mp"
The text was updated successfully, but these errors were encountered: