Running with SYCL #6606
Replies: 1 comment
-
Ah nevermind, my mistake was copying the repo and not realizing that some virtualenv paths are absolute, so I had my changes split over two installs. After upgrading numba to latest it seems to work. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Since IPEX currently doesn't work outside a narrow band of OS and system dependencies (and crashes with the import accelerate log) I would like to run GGUF models with SYCL instead.
The way I compile native llama.cpp for SYCL is as follows:
Which builds and runs fine (the speed increase is noticable, about 2x for tg, 20x for pp), so I've attempted to install the webui in CPU mode and then recompile llama-cpp-python with SYCL, like so:
Which seems to build wheels fine, if it's even taking into account the command flags which I'm mildly suspecting it's not, but I've specified them in every shape and form possible so I'm not sure what else to do.
I've then removed the
--cpu
param fromCMD_FLAGS.txt
and unchecked "cpu" in model settings, offloaded layers to the GPU, but it still loads it in CPU mode for some reason. At least I'm not seeing any SYCL printouts from llama.cpp and it runs as slow as it does on the CPU.Is there anything else that I'm missing that's forcing it to fall back to CPU mode?
As for the system info I'm running Ubuntu Server 24.04, OneAPI 2025.0.1, on x64_86 Core Ultra 5 125H with the Arc iGPU.
Beta Was this translation helpful? Give feedback.
All reactions