Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serve.llm][bugfix] Fix the wrong device_capability issue in vllm on quantized models. #51007

Merged

Conversation

kouroshHakha
Copy link
Contributor

Fixes a bug where get_device_capability() was getting cached based on the wrong environment variables. vllm assumes that upon import the CUDA_AVAILABLE_DEVICES will have their final values and will cache some attributes like cuda device compatibility.

If later we use ray actors to serialize the modules over the cache apparently also gets serialized with the wrong values. This PR clears the cache before creating engine so the values will be recomputed based on the right env variables.

…text for compatibility of the devices

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@kouroshHakha kouroshHakha requested a review from a team as a code owner March 1, 2025 00:15
Copy link
Contributor

@GeneDer GeneDer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@GeneDer GeneDer added the go add ONLY when ready to merge, run all tests label Mar 1, 2025
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
…kha/ray into kh/fix-serve-quantized

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@kouroshHakha
Copy link
Contributor Author

The doc issue seems unrelated. Merging ...

@kouroshHakha kouroshHakha merged commit bacd30d into ray-project:master Mar 3, 2025
4 of 5 checks passed
Michaelhess17 pushed a commit to Michaelhess17/ray that referenced this pull request Mar 3, 2025
…quantized models. (ray-project#51007)

Fixes a bug where get_device_capability() was getting cached based on
the wrong environment variables. `vllm` assumes that upon import the
`CUDA_AVAILABLE_DEVICES` will have their final values and will cache
some attributes like cuda device compatibility.


If later we use ray actors to serialize the modules over the cache
apparently also gets serialized with the wrong values. This PR clears
the cache before creating engine so the values will be recomputed based
on the right env variables.

---------

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
xsuler pushed a commit to antgroup/ant-ray that referenced this pull request Mar 4, 2025
…quantized models. (ray-project#51007)

Fixes a bug where get_device_capability() was getting cached based on
the wrong environment variables. `vllm` assumes that upon import the
`CUDA_AVAILABLE_DEVICES` will have their final values and will cache
some attributes like cuda device compatibility.


If later we use ray actors to serialize the modules over the cache
apparently also gets serialized with the wrong values. This PR clears
the cache before creating engine so the values will be recomputed based
on the right env variables.

---------

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants