Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TPU] Use Ray for default distributed backend #8389

Merged
merged 2 commits into from
Sep 12, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions vllm/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -869,6 +869,13 @@ def __init__(
f"distributed executor backend "
f"'{self.distributed_executor_backend}'.")

if current_platform.is_tpu() and self.world_size > 1:
if self.distributed_executor_backend is None:
self.distributed_executor_backend = "ray"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, I think this line should be enough to change the default backend to ray in tpu case.

Copy link
Collaborator Author

@WoosukKwon WoosukKwon Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh the error is for those who use distributed_executor_backend="mp".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to error if users explicitly specify the mp backend?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MP backend is not supported for TPUs at the moment. Without this line, the user will get the error:

"/vllm/engine/llm_engine.py", line 505, in _get_executor_cls
    assert distributed_executor_backend is None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by Actually, the MP backend would also work for TPUs

so MP backend for TPU is actually not implemented yet.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We don't have a executor for tpu + mp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @njhill if you have any misunderstanding. it is because we currently only have ray backend supported in tpu.

if self.distributed_executor_backend != "ray":
raise ValueError(
"TPU backend only supports Ray for distributed inference.")
WoosukKwon marked this conversation as resolved.
Show resolved Hide resolved

if self.distributed_executor_backend is None and self.world_size > 1:
# We use multiprocessing by default if world_size fits on the
# current node and we aren't in a ray placement group.
Expand Down
Loading