Running with SYCL #6606

MoffKalast · 2024-12-24T15:34:53Z

MoffKalast
Dec 24, 2024

Since IPEX currently doesn't work outside a narrow band of OS and system dependencies (and crashes with the import accelerate log) I would like to run GGUF models with SYCL instead.

The way I compile native llama.cpp for SYCL is as follows:

source /opt/intel/oneapi/setvars.sh
mkdir -p build
cd build
cmake .. -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON -DLLAMA_SERVER_SSL=ON -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release -j4 -v

Which builds and runs fine (the speed increase is noticable, about 2x for tg, 20x for pp), so I've attempted to install the webui in CPU mode and then recompile llama-cpp-python with SYCL, like so:

./cmd_linux.sh
source /opt/intel/oneapi/setvars.sh
pip uninstall llama-cpp-python
DGGML_SYCL=ON DCMAKE_C_COMPILER=icx DCMAKE_CXX_COMPILER=icpx DGGML_SYCL_F16=ON DCMAKE_BUILD_TYPE=Release CMAKE_ARGS="-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON -DCMAKE_BUILD_TYPE=Release" pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir

Which seems to build wheels fine, if it's even taking into account the command flags which I'm mildly suspecting it's not, but I've specified them in every shape and form possible so I'm not sure what else to do.

I've then removed the --cpu param from CMD_FLAGS.txt and unchecked "cpu" in model settings, offloaded layers to the GPU, but it still loads it in CPU mode for some reason. At least I'm not seeing any SYCL printouts from llama.cpp and it runs as slow as it does on the CPU.

Is there anything else that I'm missing that's forcing it to fall back to CPU mode?

As for the system info I'm running Ubuntu Server 24.04, OneAPI 2025.0.1, on x64_86 Core Ultra 5 125H with the Arc iGPU.

MoffKalast · 2024-12-24T16:14:49Z

MoffKalast
Dec 24, 2024
Author

Ah nevermind, my mistake was copying the repo and not realizing that some virtualenv paths are absolute, so I had my changes split over two installs. After upgrading numba to latest it seems to work.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running with SYCL #6606

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Running with SYCL #6606

MoffKalast Dec 24, 2024

Replies: 1 comment

MoffKalast Dec 24, 2024 Author

MoffKalast
Dec 24, 2024

MoffKalast
Dec 24, 2024
Author