[serve.llm][bugfix] Fix the wrong device_capability issue in vllm on quantized models. #51007

kouroshHakha · 2025-03-01T00:15:13Z

Fixes a bug where get_device_capability() was getting cached based on the wrong environment variables. vllm assumes that upon import the CUDA_AVAILABLE_DEVICES will have their final values and will cache some attributes like cuda device compatibility.

If later we use ray actors to serialize the modules over the cache apparently also gets serialized with the wrong values. This PR clears the cache before creating engine so the values will be recomputed based on the right env variables.

…text for compatibility of the devices Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

GeneDer

Great work!

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

…kha/ray into kh/fix-serve-quantized Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha · 2025-03-03T16:38:24Z

The doc issue seems unrelated. Merging ...

…quantized models. (ray-project#51007) Fixes a bug where get_device_capability() was getting cached based on the wrong environment variables. `vllm` assumes that upon import the `CUDA_AVAILABLE_DEVICES` will have their final values and will cache some attributes like cuda device compatibility. If later we use ray actors to serialize the modules over the cache apparently also gets serialized with the wrong values. This PR clears the cache before creating engine so the values will be recomputed based on the right env variables. --------- Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Added explicit cache clearing to avoid accidentally reusing wrong con…

0329549

…text for compatibility of the devices Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha requested a review from a team as a code owner March 1, 2025 00:15

GeneDer approved these changes Mar 1, 2025

View reviewed changes

GeneDer added the go add ONLY when ready to merge, run all tests label Mar 1, 2025

wip

672d193

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

comaniac approved these changes Mar 1, 2025

View reviewed changes

kouroshHakha added 3 commits February 28, 2025 17:31

wip

e5511a9

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

851758a

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Merge branch 'kh/fix-serve-quantized' of https://github.com/kouroshHa…

45a20bc

…kha/ray into kh/fix-serve-quantized Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha merged commit bacd30d into ray-project:master Mar 3, 2025
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[serve.llm][bugfix] Fix the wrong device_capability issue in vllm on quantized models. #51007

[serve.llm][bugfix] Fix the wrong device_capability issue in vllm on quantized models. #51007

kouroshHakha commented Mar 1, 2025

GeneDer left a comment

kouroshHakha commented Mar 3, 2025

[serve.llm][bugfix] Fix the wrong device_capability issue in vllm on quantized models. #51007

[serve.llm][bugfix] Fix the wrong device_capability issue in vllm on quantized models. #51007

Conversation

kouroshHakha commented Mar 1, 2025

GeneDer left a comment

Choose a reason for hiding this comment

kouroshHakha commented Mar 3, 2025