Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vLLM's default multiprocessing method is incompatible with ROCm and Gaudi #2439

Open
tiran opened this issue Oct 11, 2024 · 6 comments
Open
Assignees
Labels
bug Something isn't working jira This triggers jira sync vllm vLLM specific issues

Comments

@tiran
Copy link
Contributor

tiran commented Oct 11, 2024

Describe the bug
vLLM defaults to VLLM_WORKER_MULTIPROC_METHOD=fork, https://docs.vllm.ai/en/v0.6.1/serving/env_vars.html . Forking is incompatible with ROCm and Gaudi.

To Reproduce

  1. Configure InstructLab to use more than one GPU
  2. Run ilab model serve on a system with more than one AMD GPU

Expected behavior
vLLM works

Screenshots

RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Additional context
I recommend to switch to "spawn". Python is switching from fork to spawn for all platforms. The fork method has issues, e.g. it can lead to deadlocks when a process mixes threads and fork.

I switch InstructLab to spawn a long time ago, because it was causing trouble on Gaudi, see #956. InstructLab should set VLLM_WORKER_MULTIPROC_METHOD=spawn by default.

@tiran tiran added bug Something isn't working vllm vLLM specific issues jira This triggers jira sync labels Oct 11, 2024
@ktam3
Copy link

ktam3 commented Oct 16, 2024

Additional comment by Russell

Indeed, setting the environment variable is the right thing to do in the short term, as it will work with the currently shipped version of vLLM.

FYI, for the future, spawn will be used automatically when you run vllm serve as of v0.6.3. vllm-project/vllm#8823

@nathan-weinberg
Copy link
Member

@n1hility given @russellb's comment above, will this ticket be covered by the planned VLLM bump?

@nathan-weinberg nathan-weinberg added this to the 0.21.0 milestone Nov 1, 2024
@n1hility
Copy link
Member

IMO we should fix this in the ODH branches for the vllm versions we are pulling in, and also in any container definitions. I don't have a problem with adding some code as a rededundancy to instructlab to handle plugging in different versions of vllm .

@nathan-weinberg nathan-weinberg removed this from the 0.21.0 milestone Nov 13, 2024
@nathan-weinberg nathan-weinberg added this to the 0.22.0 milestone Nov 27, 2024
@nathan-weinberg
Copy link
Member

@n1hility does ODH vLLM 0.6.2 have this fix or do we need to wait for the next bump?

@nathan-weinberg nathan-weinberg modified the milestones: 0.22.0, 0.23.0 Dec 13, 2024
@n1hility
Copy link
Member

Looks like we need to wait for another bump. The branches were created in odh for intel and amd, but not utilized yet and we still need a patch here.

@nathan-weinberg nathan-weinberg removed this from the 0.23.0 milestone Jan 28, 2025
@nathan-weinberg
Copy link
Member

@n1hility @tiran do y'all know if ODH vLLM 0.6.4post1 (current vLLM version we are using) or alternatively, 0.6.6post1 (next version we are planning to bump to) has this issue, or could it be closed out?

cc @fabiendupont

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working jira This triggers jira sync vllm vLLM specific issues
Projects
None yet
Development

No branches or pull requests

4 participants