[RFC]: Make device agnostic for diverse hardware support #9268

wangshuai09 · 2024-10-11T02:37:19Z

Motivation.

vLLM has already been adapted to many hardware devices, such as GPU, TPU, and XPU. However, adapting these backends requires implementing separate Worker/Executor/Model Runner frameworks for each, which leads to code redundancy and maintenance difficulties.
In fact, these hardware framework codes can be abstracted at the device layer, forming a unified framework. This way, only one set of code would need to be maintained, and different backends would only need to implement the device layer interfaces and any device-specific logic if necessary.
I also found some new features are only updated on GPU-related codes. In fact, these codes are also applicable to other hardware, but it is difficult for other hardware to perceive and follow these updates.

Proposed Change.

This RFC is intended to establish a unified framework.
Maybe there will be diffuculty for intergrating hardware framework to common framework, It makes sense to work towards this direction, the diagram below represents a proposed solution:

Taking Executor as example, for third-party hardware devices based on the pytorch ecosystem, the basic interfaces of torch have been well adapted, so after abstracting the device-related hard coding, such as torch.cuda, torch.xpu, GPU Executor could be used as the Common Executor of all third-party devices.

Following #6080, different hardware backends can put their own device-specific code in NewBackendPlatform, so that the framework can be device-agnostic through current_platform. For example, torch.cuda.synchronize could use current_platform.synchronize.

Feedback Period.

To realize this idea will involve more files, so the following steps are currently sorted out to finally achieve the above purpose：

BackendPlatform
- Neuron
- Openvino
Backend Type Check
- is_cpu -> current_platform.is_cpu
- is_xpu -> current_platform.is_xpu
- is_openvino -> current_platform.is_openvino
- is_neuron -> current_platform.is_neuron
Backend Releated Func
- seed_everything -> current_platform.seed_everything
- is_pin_memory_available -> current_platform.is_pin_memory_available
- DeviceMemoryProfiler -> current_platform.memory_profiler
- wrap_device -> current_platform.wrap_device
Backend Releated Hard Coding
- torch.xxx.get_device_name -> current_platform.get_device
- torch.xxx.Event -> current_platform.Event
- torch.xxx.synchronize -> current_platform.synchronize
- torch.xxx.Stream -> current_platform.Stream
- torch.xxx.stream -> current_platform.stream
- torch.xxx.empty_cache -> current_platform.empty_cache
- torch.xxx.device_count -> current_platform.device_count
- torch.xxx.memory_allocated -> current_platform.memroy_allocated
- torch.xxx.set_device -> current_paltform.set_device
- torch.xxx.current_device -> current_platform.current_device
- torch.xxx.get_device_capability -> current_platform.get_device_capability
Try to unify hardware framework, cpu releated framework may have problem to intergrate.
- gpu(neuron,openvino,tpu,xpu,..)_executor -> common_backend_executor
- gpu(neuron,openvino,tpu,xpu,..)_worker -> common_backend_worker
- gpu(neuron,openvino,tpu,xpu,..)_model_runner -> common_backend_model_runner

There must be omissions or difficulties in actual implementation here, keep updating.

CC List.

@youkaichao @WoosukKwon

Any Other Things.

No response

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

youkaichao · 2024-10-11T06:30:05Z

we can do it step by step.

is_cpu -> current_platform.is_cpu
is_xpu -> current_platform.is_xpu
is_openvino -> current_platform.is_openvino
is_neuron -> current_platform.is_neuron

this can be the first step, and should be easy to do.

the rest might need some case-by-case discussion.

wangshuai09 added the RFC label Oct 11, 2024

youkaichao self-assigned this Oct 11, 2024

wangshuai09 mentioned this issue Oct 21, 2024

[Hardware][CPU] using current_platform.is_cpu #9536

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC]: Make device agnostic for diverse hardware support #9268

[RFC]: Make device agnostic for diverse hardware support #9268

wangshuai09 commented Oct 11, 2024 •

edited

Loading

youkaichao commented Oct 11, 2024

[RFC]: Make device agnostic for diverse hardware support #9268

[RFC]: Make device agnostic for diverse hardware support #9268

Comments

wangshuai09 commented Oct 11, 2024 • edited Loading

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

youkaichao commented Oct 11, 2024

wangshuai09 commented Oct 11, 2024 •

edited

Loading