[Feature, Hardware] Enable SGLang on XPU GPUs via PyTorch #1480

liangan1 · 2024-09-20T07:58:38Z

Pytorch already support XPU device since 2.4 release and xpu is also supported in OpenAI Trition. So, it should works with the Trition attention backend in SGLang. In this PR, We add 'xpu' device into SGLang.

Status

LLama-2-7b works for the latency benchmark.
VLLM_TEST_COMPILE_NO_CUSTOM_OPS=1 python -m sglang.bench_latency --model-path ~/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-hf/snapshots/6fdf2e60f86ff2481f2241aaee459f85b5b0bbb9/ --device xpu

ToDO:

Functionality

Add the BKC to prepare the the xpu enabled pytorch and resolve the software combability with vllm when using the XPU enabled pytorch.
Enable other benchmarks.
Add UTs for the XPU device.

Performance

Customized ops support.

zhyncs · 2024-09-21T06:10:13Z

python/pyproject.toml

@@ -22,8 +22,8 @@ dependencies = [
 [project.optional-dependencies]
 srt = ["aiohttp", "decord", "fastapi", "hf_transfer", "huggingface_hub", "interegular",
       "packaging", "pillow", "psutil", "pydantic", "python-multipart",
-       "torch", "torchao", "uvicorn", "uvloop", "zmq",
-       "vllm==0.5.5", "outlines>=0.0.44"]


Why was the vllm dependency removed?

Enable XPU device

db59d67

liangan1 marked this pull request as draft September 20, 2024 07:58

liangan1 changed the title ~~Enable XPU device~~ [Hardware|Feature] Enable XPU device Sep 20, 2024

liangan1 changed the title ~~[Hardware|Feature] Enable XPU device~~ [Feature, Hardware] Enable SGLang on XPU GPUs via PyTorch Sep 20, 2024

zhyncs reviewed Sep 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature, Hardware] Enable SGLang on XPU GPUs via PyTorch #1480

[Feature, Hardware] Enable SGLang on XPU GPUs via PyTorch #1480

liangan1 commented Sep 20, 2024 •

edited

Loading

zhyncs Sep 21, 2024

[Feature, Hardware] Enable SGLang on XPU GPUs via PyTorch #1480

Are you sure you want to change the base?

[Feature, Hardware] Enable SGLang on XPU GPUs via PyTorch #1480

Conversation

liangan1 commented Sep 20, 2024 • edited Loading

Status

ToDO:

Functionality

Performance

zhyncs Sep 21, 2024

Choose a reason for hiding this comment

liangan1 commented Sep 20, 2024 •

edited

Loading