Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question: Is it possible to avoid ray in single machine multiple GPUs serving? #391

Closed
gaocegege opened this issue Jul 7, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@gaocegege
Copy link

I'm uncertain whether it's feasible to bypass Ray when serving on a single machine with multiple GPUs. Ray introduces additional maintenance costs in this use case.

@irasin
Copy link
Contributor

irasin commented Jul 7, 2023

At least ray is needed if you want to use tensor parallel with multiple GPUS, since each Worker instance should exist in a single process but not thread. However, we can just replace ray with multiprocess in this regard.

I haven't seen the other reason why we need ray in the code, maybe there are something, for example, memory issue, object sharing or some other stuff.

@WoosukKwon WoosukKwon added the enhancement New feature or request label Jul 14, 2023
@hmellor
Copy link
Collaborator

hmellor commented Mar 6, 2024

Closing because it appears Ray is only used if:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants