You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi the community, following the discussion #3965, we plan to contribute native SYCL backend to llama.cpp.
Motivation
Intel Arc series GPU provides accountable VRAM size and bandwidth, which the current OpenCL backend can't fully utilize especially on LLM. We expect a significant performance improvement with native SYCL backend.
We will implement the key operators of GGML in SYCL similar to the approach of supporting Metal and Vulkan. Basically, the steps are described as below:
new backend; h2d & d2h
oneMKL-dpcpp based FP32 & FP16 GEMM
native SYCL kernels for de-quantization
native SYCL kernels for other operators
Note:
Since llama.cpp has been evolving rapidly and new features will probably be supported through CUDA first, we plan to enable SYCLomatic to help migrate the code from CUDA to SYCL.
We plan to further introduce the template-based library e.g., XeTLA as mentioned in #3965 as the next stage, while we will be focusing on native SYCL support in this proposal.
Summary
We started working on native SYCL kernels and enabling SYCL backend in llama.cpp for Intel GPUs. Please feel free to drop a note. Thanks.
The text was updated successfully, but these errors were encountered:
Feature Description
Hi the community, following the discussion #3965, we plan to contribute native SYCL backend to llama.cpp.
Motivation
Intel Arc series GPU provides accountable VRAM size and bandwidth, which the current OpenCL backend can't fully utilize especially on LLM. We expect a significant performance improvement with native SYCL backend.
References:
Possible Implementation
Native Kernels
We will implement the key operators of GGML in SYCL similar to the approach of supporting Metal and Vulkan. Basically, the steps are described as below:
We plan to further introduce the template-based library e.g., XeTLA as mentioned in #3965 as the next stage, while we will be focusing on native SYCL support in this proposal.
Summary
We started working on native SYCL kernels and enabling SYCL backend in llama.cpp for Intel GPUs. Please feel free to drop a note. Thanks.
The text was updated successfully, but these errors were encountered: