Feature: Integrate with unified SYCL backend for Intel GPUs #2690

abhilash1910 · 2023-08-21T16:19:31Z

Motivation:
Thanks for creating llama.cpp. There has been quite an effort to integrate OpenCL runtime for AVX instruction sets.
However for running on Intel graphics cards , there needs to be additional sycl runtime porting over the OpenCL runtime.
This is a feature enabling PR which is now in final stages with expectations for community feedback in terms of performance and improvements.
Co authored by : @NeoZhangJianyu , @airMeng , @luoyu-intel and thanks to @AidanBeltonS (Codeplay) for suggestions and recommendations. Thanks to all associated to help in improving and shaping the PR in terms of feedback and future performance .
Thanks to @jacob1218 for running initial benchmarks :

Since the development is based on SYCLomatic runtime which is evolving with latest upgrades, feedbacks/suggestions and comments are welcome .

Tagging @ggerganov .

ggerganov · 2023-10-02T07:53:06Z

This looks interesting, but I need some more context and numbers. What hardware is this useful for?

Jacoby1218 · 2023-10-04T07:03:32Z

This looks interesting, but I need some more context and numbers. What hardware is this useful for?

Discrete Intel GPUs (Intel ARC and the professional variants.) Very interested to see performance numbers with this vs CLBlast.

abhilash1910 · 2023-10-04T08:18:21Z

This looks interesting, but I need some more context and numbers. What hardware is this useful for?

@ggerganov yes this is for Intel dGPUs (max and flex) including arc GPUs which rely on sycl backend. Currently OpenCL is already supported but for perf and better optimization (from Intel llvm) this PR is raised, currently I am testing its perf and making it stable. I will flag it off for review once things provide proper stability .

unbrice · 2023-10-18T11:05:29Z

I'm also interested in this feature. @abhilash1910 are you actively working on it, or is it available for grabs?

abhilash1910 · 2023-10-18T12:59:50Z

@unbrice yes it is under dev , but if you are able to compile then great. There are some configs and tasks which are pending to be added.

itlackey · 2023-12-03T06:14:39Z

This looks interesting, but I need some more context and numbers. What hardware is this useful for?

I have put together a repo that shows and example of building llama.cpp with OpenCL and running in on an Intel A770 via Docker. The docker file and all associated scripts showing how the container is built, ran, and tested are included. There is an example log file that shows more of the console logs from the docker container including responding to the curl command in the test.sh file.

https://github.com/itlackey/llama.cpp-opencl

I have the A770 and a 4060ti 16GB running in the same machine. Below are the examples of output when running the same model on either card. The 4060 is 10x faster than the Arc. This is not the case when running things like Intel Extensions for Pytorch. These cards should perform vary similarly when running optimally. This leads me to believe that the OpenCL support in llama.cpp is not using the card to it's fullest potential. Hopefully adding SYCL (or Vulkan) support would bring the Arc up to speed.

Hopefully this is helpful.

A770 Logs:
print_timings: prompt eval time = 871.14 ms / 14 tokens ( 62.22 ms per token, 16.07 tokens per second)
print_timings: eval time = 14133.84 ms / 128 runs ( 110.42 ms per token, 9.06 tokens per second)
print_timings: total time = 15004.98 ms
slot 0 released (143 tokens in cache)
{"timestamp":1701583081,"level":"INFO","function":"log_server_request","line":2601,"message":"request","remote_addr":"172.17.0.1","remote_port":43572,"status":200,"method":"POST","path":"/completion","params":{}}

4060ti Logs:

llama_print_timings: load time = 808.00 ms
llama_print_timings: sample time = 49.21 ms / 128 runs ( 0.38 ms per token, 2600.99 tokens per second)
llama_print_timings: prompt eval time = 86.05 ms / 14 tokens ( 6.15 ms per token, 162.71 tokens per second)
llama_print_timings: eval time = 2867.97 ms / 127 runs ( 22.58 ms per token, 44.28 tokens per second)
llama_print_timings: total time = 3033.07 ms
{"timestamp":1701583788,"level":"INFO","function":"log_server_request","line":1244,"message":"request","remote_addr":"127.0.0.1","remote_port":57650,"status":200,"method":"POST","path":"/completion","params":{}}

JohnnyOpcode · 2023-12-03T06:24:34Z

I'm starting my SYCL research and development here and it's looking like a decent sized effort. Macs might not play well with this.

https://github.com/JohnnyOpcode/ggml-sycl

ggerganov · 2023-12-03T08:37:37Z

Btw, the existing OpenCL implementation offloads only the matrix multiplications to the GPU - the rest of the ops are still running on the CPU and there is overhead from constantly moving the activations back and forth between host and device memory.

Ideally, the entire graph computation should be offloaded, similar to the CUDA and Metal backends

AlexFierro9 · 2023-12-03T12:53:52Z

@unbrice yes it is under dev , but if you are able to compile then great. There are some configs and tasks which are pending to be added.

@abhilash1910 You need any support with adding any remaining configurations or is it complete?

netrunnereve · 2023-12-03T19:12:17Z

@itlackey Have you tried the WIP Vulkan backend at #2059?

itlackey · 2023-12-03T19:26:18Z

@itlackey Have you tried the WIP Vulkan backend at #2059?

I have not. I might try to pull the fork down and see what the performance looks like. Any idea if this will be merged soon?

netrunnereve · 2023-12-07T02:36:03Z

I have not. I might try to pull the fork down and see what the performance looks like. Any idea if this will be merged soon?

I have no idea but it's been working perfectly for me with Llama and Mistral models. While I don't think there are shaders for all the ops yet Vulkan uses 100% of my GPU (unlike OpenCL) and it runs 2x faster.

abhilash1910 · 2023-12-20T08:18:40Z

@ggerganov could you help trigger CI ? Thanks

koech-v · 2024-01-14T06:41:33Z

🤞

mgolub2 · 2024-01-17T19:45:06Z

I have not. I might try to pull the fork down and see what the performance looks like. Any idea if this will be merged soon?

I have no idea but it's been working perfectly for me with Llama and Mistral models. While I don't think there are shaders for all the ops yet Vulkan uses 100% of my GPU (unlike OpenCL) and it runs 2x faster.

I've been trying to get this branch to build to play around with my a770, but so far have had no luck. What environment/dependencies does one need to build this? I've tried various oneAPI containers but none seem to be able to find SYCL during cmake configuration.

Edit - I realize now you were talking about the Vulcan fork not this one, sorry.

AidanBeltonS

Some small comments regarding the README and exmpales

CMakeLists.txt

README.md

README_sycl.md

examples/sycl/README.md

examples/sycl/ls-sycl-device.cpp

AidanBeltonS

Some comments from trying to compile this application with open source DPCPP release

common/common.cpp

llama.cpp

README_sycl.md

ggml-sycl.h

ggml-sycl.cpp

Jacoby1218 · 2024-01-22T19:12:56Z

This still doesn't compile.

  138 |         return src->buffer->iface.cpy_tensor(dst_buf, src, dst);
      |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/jacoby1218/llama.cpp-sycl/ggml-backend.c:494:30: error: incompatible function pointer types initializing 'void (*)(ggml_backend_buffer_t, const struct ggml_tensor *, struct ggml_tensor *)' (aka 'void (*)(struct ggml_backend_buffer *, const struct ggml_tensor *, struct ggml_tensor *)') with an expression of type 'bool (ggml_backend_buffer_t, const struct ggml_tensor *, struct ggml_tensor *)' (aka 'bool (struct ggml_backend_buffer *, const struct ggml_tensor *, struct ggml_tensor *)') [-Wincompatible-function-pointer-types]
  494 |     /* .cpy_tensor      = */ ggml_backend_cpu_buffer_cpy_tensor,
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/jacoby1218/llama.cpp-sycl/ggml-backend.c:507:30: error: incompatible function pointer types initializing 'void (*)(ggml_backend_buffer_t, const struct ggml_tensor *, struct ggml_tensor *)' (aka 'void (*)(struct ggml_backend_buffer *, const struct ggml_tensor *, struct ggml_tensor *)') with an expression of type 'bool (ggml_backend_buffer_t, const struct ggml_tensor *, struct ggml_tensor *)' (aka 'bool (struct ggml_backend_buffer *, const struct ggml_tensor *, struct ggml_tensor *)') [-Wincompatible-function-pointer-types]
  507 |     /* .cpy_tensor      = */ ggml_backend_cpu_buffer_cpy_tensor,
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~```

NeoZhangJianyu · 2024-01-23T00:40:23Z

This still doesn't compile.

  138 |         return src->buffer->iface.cpy_tensor(dst_buf, src, dst);
      |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/jacoby1218/llama.cpp-sycl/ggml-backend.c:494:30: error: incompatible function pointer types initializing 'void (*)(ggml_backend_buffer_t, const struct ggml_tensor *, struct ggml_tensor *)' (aka 'void (*)(struct ggml_backend_buffer *, const struct ggml_tensor *, struct ggml_tensor *)') with an expression of type 'bool (ggml_backend_buffer_t, const struct ggml_tensor *, struct ggml_tensor *)' (aka 'bool (struct ggml_backend_buffer *, const struct ggml_tensor *, struct ggml_tensor *)') [-Wincompatible-function-pointer-types]
  494 |     /* .cpy_tensor      = */ ggml_backend_cpu_buffer_cpy_tensor,
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/jacoby1218/llama.cpp-sycl/ggml-backend.c:507:30: error: incompatible function pointer types initializing 'void (*)(ggml_backend_buffer_t, const struct ggml_tensor *, struct ggml_tensor *)' (aka 'void (*)(struct ggml_backend_buffer *, const struct ggml_tensor *, struct ggml_tensor *)') with an expression of type 'bool (ggml_backend_buffer_t, const struct ggml_tensor *, struct ggml_tensor *)' (aka 'bool (struct ggml_backend_buffer *, const struct ggml_tensor *, struct ggml_tensor *)') [-Wincompatible-function-pointer-types]
  507 |     /* .cpy_tensor      = */ ggml_backend_cpu_buffer_cpy_tensor,
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~```

Yes. I'm fixing this issue.
We found after rebase with latest code, there is such issue.

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

NeoZhangJianyu · 2024-01-26T15:49:38Z

@ggerganov
It's great to see your approval!
Before merge, does the CI need to be triggered?

NeoZhangJianyu · 2024-01-27T01:17:32Z

@ggerganov
Could you merge the PR? Looks everything is ready.

ggerganov · 2024-01-27T10:22:18Z

Likely will merge later today or tomorrow

README_sycl.md

ngxson · 2024-01-27T23:58:03Z

Thanks for all your hard work guys! I've been able to easily compile and run it using intel/hpckit docker image without any problem.

When running inside container, you can pass the GPU through the container with this argument: --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card1:/dev/dri/card1

I'm using a iGPU (Intel(R) Iris(R) Xe Graphics) and be able to utilize 100% of its power, though unfortunately the performance is not better than just using CPU. I definitely should get myself an external GPU.

I'll update the guide for compiling & running with docker in the future.

NeoZhangJianyu · 2024-01-28T01:20:44Z

Thanks for all your hard work guys! I've been able to easily compile and run it using intel/hpckit docker image without any problem.

When running inside container, you can pass the GPU through the container with this argument: --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card1:/dev/dri/card1

I'm using a iGPU (Intel(R) Iris(R) Xe Graphics) and be able to utilize 100% of its power, though unfortunately the performance is not better than just using CPU. I definitely should get myself an external GPU.

I'll update the guide for compiling & running with docker in the future.

Thank your update for docker!

Intel iGPU has many EUs. In general, the iGPU includes 32 EUs. It's slow. If you try it on iGPU of Meteor Lake (new Intel Core iGPU), or Intel Arc/Flex/Max dGPU, the performance is good.

sorasoras · 2024-01-29T07:23:56Z

@abhilash1910
I have hard time compile this on my windows

cmake .. -G "Ninja" -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER="C:/Program Files (x86)/Intel/oneAPI/compiler/2024.0/bin/compiler/clang.exe" -DCMAKE_CXX_COMPILER="C:/Program Files (x86)/Intel/oneAPI/compiler/2024.0/bin/compiler/clang++.exe"
-- The C compiler identification is IntelLLVM 2024.0.2 with GNU-like command-line
-- The CXX compiler identification is IntelLLVM 2024.0.2 with GNU-like command-line
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - failed
-- Check for working C compiler: C:/Program Files (x86)/Intel/oneAPI/compiler/2024.0/bin/compiler/clang.exe
-- Check for working C compiler: C:/Program Files (x86)/Intel/oneAPI/compiler/2024.0/bin/compiler/clang.exe - broken
CMake Error at C:/Strawberry/c/share/cmake-3.26/Modules/CMakeTestCCompiler.cmake:67 (message):
  The C compiler

    "C:/Program Files (x86)/Intel/oneAPI/compiler/2024.0/bin/compiler/clang.exe"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: W:/git/llama.cpp/sycl/CMakeFiles/CMakeScratch/TryCompile-vi8w7r

    Run Build Command(s):C:/Strawberry/c/bin/ninja.exe -v cmTC_b1bda && [1/2] C:\PROGRA~2\Intel\oneAPI\compiler\2024.0\bin\compiler\clang.exe  /nologo   /DWIN32 /D_WINDOWS /W3  /MDd /Zi /Ob0 /Od /RTC1 -QMD -QMT CMakeFiles\cmTC_b1bda.dir\testCCompiler.c.obj -QMF CMakeFiles\cmTC_b1bda.dir\testCCompiler.c.obj.d /FoCMakeFiles\cmTC_b1bda.dir\testCCompiler.c.obj /FdCMakeFiles\cmTC_b1bda.dir\ -c W:\git\llama.cpp\sycl\CMakeFiles\CMakeScratch\TryCompile-vi8w7r\testCCompiler.c
    FAILED: CMakeFiles/cmTC_b1bda.dir/testCCompiler.c.obj
    C:\PROGRA~2\Intel\oneAPI\compiler\2024.0\bin\compiler\clang.exe  /nologo   /DWIN32 /D_WINDOWS /W3  /MDd /Zi /Ob0 /Od /RTC1 -QMD -QMT CMakeFiles\cmTC_b1bda.dir\testCCompiler.c.obj -QMF CMakeFiles\cmTC_b1bda.dir\testCCompiler.c.obj.d /FoCMakeFiles\cmTC_b1bda.dir\testCCompiler.c.obj /FdCMakeFiles\cmTC_b1bda.dir\ -c W:\git\llama.cpp\sycl\CMakeFiles\CMakeScratch\TryCompile-vi8w7r\testCCompiler.c
    clang: error: unknown argument: '-QMD'
    clang: error: unknown argument: '-QMT'
    clang: error: unknown argument: '-QMF'
    clang: error: no such file or directory: '/nologo'
    clang: error: no such file or directory: '/DWIN32'
    clang: error: no such file or directory: '/D_WINDOWS'
    clang: error: no such file or directory: '/W3'
    clang: error: no such file or directory: '/MDd'
    clang: error: no such file or directory: '/Zi'
    clang: error: no such file or directory: '/Ob0'
    clang: error: no such file or directory: '/Od'
    clang: error: no such file or directory: '/RTC1'
    clang: error: no such file or directory: 'CMakeFiles\cmTC_b1bda.dir\testCCompiler.c.obj'
    clang: error: no such file or directory: 'CMakeFiles\cmTC_b1bda.dir\testCCompiler.c.obj.d'
    clang: error: no such file or directory: '/FoCMakeFiles\cmTC_b1bda.dir\testCCompiler.c.obj'
    clang: error: no such file or directory: '/FdCMakeFiles\cmTC_b1bda.dir\'
    ninja: build stopped: subcommand failed.





  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:2 (project)


-- Configuring incomplete, errors occurred!

Do you have any suggestion?

abhilash1910 · 2024-01-29T07:29:07Z

Yes @sorasoras WIN build support is next in our development plan. We are working to provide the build option .

NeoZhangJianyu · 2024-01-29T15:08:32Z

@abhilash1910 I have hard time compile this on my windows

cmake .. -G "Ninja" -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER="C:/Program Files (x86)/Intel/oneAPI/compiler/2024.0/bin/compiler/clang.exe" -DCMAKE_CXX_COMPILER="C:/Program Files (x86)/Intel/oneAPI/compiler/2024.0/bin/compiler/clang++.exe"
-- The C compiler identification is IntelLLVM 2024.0.2 with GNU-like command-line
-- The CXX compiler identification is IntelLLVM 2024.0.2 with GNU-like command-line
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - failed
-- Check for working C compiler: C:/Program Files (x86)/Intel/oneAPI/compiler/2024.0/bin/compiler/clang.exe
-- Check for working C compiler: C:/Program Files (x86)/Intel/oneAPI/compiler/2024.0/bin/compiler/clang.exe - broken
CMake Error at C:/Strawberry/c/share/cmake-3.26/Modules/CMakeTestCCompiler.cmake:67 (message):
  The C compiler

    "C:/Program Files (x86)/Intel/oneAPI/compiler/2024.0/bin/compiler/clang.exe"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: W:/git/llama.cpp/sycl/CMakeFiles/CMakeScratch/TryCompile-vi8w7r

    Run Build Command(s):C:/Strawberry/c/bin/ninja.exe -v cmTC_b1bda && [1/2] C:\PROGRA~2\Intel\oneAPI\compiler\2024.0\bin\compiler\clang.exe  /nologo   /DWIN32 /D_WINDOWS /W3  /MDd /Zi /Ob0 /Od /RTC1 -QMD -QMT CMakeFiles\cmTC_b1bda.dir\testCCompiler.c.obj -QMF CMakeFiles\cmTC_b1bda.dir\testCCompiler.c.obj.d /FoCMakeFiles\cmTC_b1bda.dir\testCCompiler.c.obj /FdCMakeFiles\cmTC_b1bda.dir\ -c W:\git\llama.cpp\sycl\CMakeFiles\CMakeScratch\TryCompile-vi8w7r\testCCompiler.c
    FAILED: CMakeFiles/cmTC_b1bda.dir/testCCompiler.c.obj
    C:\PROGRA~2\Intel\oneAPI\compiler\2024.0\bin\compiler\clang.exe  /nologo   /DWIN32 /D_WINDOWS /W3  /MDd /Zi /Ob0 /Od /RTC1 -QMD -QMT CMakeFiles\cmTC_b1bda.dir\testCCompiler.c.obj -QMF CMakeFiles\cmTC_b1bda.dir\testCCompiler.c.obj.d /FoCMakeFiles\cmTC_b1bda.dir\testCCompiler.c.obj /FdCMakeFiles\cmTC_b1bda.dir\ -c W:\git\llama.cpp\sycl\CMakeFiles\CMakeScratch\TryCompile-vi8w7r\testCCompiler.c
    clang: error: unknown argument: '-QMD'
    clang: error: unknown argument: '-QMT'
    clang: error: unknown argument: '-QMF'
    clang: error: no such file or directory: '/nologo'
    clang: error: no such file or directory: '/DWIN32'
    clang: error: no such file or directory: '/D_WINDOWS'
    clang: error: no such file or directory: '/W3'
    clang: error: no such file or directory: '/MDd'
    clang: error: no such file or directory: '/Zi'
    clang: error: no such file or directory: '/Ob0'
    clang: error: no such file or directory: '/Od'
    clang: error: no such file or directory: '/RTC1'
    clang: error: no such file or directory: 'CMakeFiles\cmTC_b1bda.dir\testCCompiler.c.obj'
    clang: error: no such file or directory: 'CMakeFiles\cmTC_b1bda.dir\testCCompiler.c.obj.d'
    clang: error: no such file or directory: '/FoCMakeFiles\cmTC_b1bda.dir\testCCompiler.c.obj'
    clang: error: no such file or directory: '/FdCMakeFiles\cmTC_b1bda.dir\'
    ninja: build stopped: subcommand failed.





  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:2 (project)


-- Configuring incomplete, errors occurred!

Do you have any suggestion?

The Windows building is in end of stage. We will create PR soon.

sorasoras · 2024-01-29T15:14:37Z

Cool,Cannot wait to test this against vulkan build

NeoZhangJianyu · 2024-01-30T08:32:13Z

Cool,Cannot wait to test this against vulkan build

Windows build PR is created: #5208

Please join to review.

mudler · 2024-02-01T11:45:21Z

Is this supposed to work with laptop/low-end iGPUs? I was getting some acceleration with openBlas but wanted this give a shot locally, but fails with:

$ GGML_SYCL_DEBUG=1 GGML_SYCL_DEVICE=0 ./bin/main -m ../../../../../models/c0c3c83d0ec33ffe925657a56b06771b -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 > logs.txt 2>&1

Log start
main: build = 2038 (ce320601)
main: built with Intel(R) oneAPI DPC++/C++ Compiler 2024.0.2 (2024.0.2.20231213) for x86_64-unknown-linux-gnu
main: seed  = 1706787641
ggml_init_sycl: GGML_SYCL_FP16:   no
ggml_init_sycl: SYCL_USE_XMX: yes
found 2 SYCL devices:
  Device 0: 12th Gen Intel(R) Core(TM) i7-1280P,	compute capability 3.0,
	max compute_units 20,	max work group size 8192,	max sub group size 64,	global mem size 67084083200
  Device 1: Intel(R) FPGA Emulation Device,	compute capability 1.2,
	max compute_units 20,	max work group size 67108864,	max sub group size 64,	global mem size 67084083200
Using device 0 (12th Gen Intel(R) Core(TM) i7-1280P) as main device
llama_model_loader: loaded meta data with 20 key-value pairs and 325 tensors from ../../../../../models/c0c3c83d0ec33ffe925657a56b06771b (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = phi2
llama_model_loader: - kv   1:                               general.name str              = Phi2
llama_model_loader: - kv   2:                        phi2.context_length u32              = 2048
llama_model_loader: - kv   3:                      phi2.embedding_length u32              = 2560
llama_model_loader: - kv   4:                   phi2.feed_forward_length u32              = 10240
llama_model_loader: - kv   5:                           phi2.block_count u32              = 32
llama_model_loader: - kv   6:                  phi2.attention.head_count u32              = 32
llama_model_loader: - kv   7:               phi2.attention.head_count_kv u32              = 32
llama_model_loader: - kv   8:          phi2.attention.layer_norm_epsilon f32              = 0.000010
llama_model_loader: - kv   9:                  phi2.rope.dimension_count u32              = 32
llama_model_loader: - kv  10:                          general.file_type u32              = 7
llama_model_loader: - kv  11:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,51200]   = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,51200]   = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  15:                      tokenizer.ggml.merges arr[str,50000]   = ["Ġ t", "Ġ a", "h e", "i n", "r e",...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 50256
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 50256
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 50256
llama_model_loader: - kv  19:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  195 tensors
llama_model_loader: - type q8_0:  130 tensors
llm_load_vocab: mismatch in special tokens definition ( 910/51200 vs 944/51200 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = phi2
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 51200
llm_load_print_meta: n_merges         = 50000
llm_load_print_meta: n_ctx_train      = 2048
llm_load_print_meta: n_embd           = 2560
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 32
llm_load_print_meta: n_embd_head_k    = 80
llm_load_print_meta: n_embd_head_v    = 80
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 2560
llm_load_print_meta: n_embd_v_gqa     = 2560
llm_load_print_meta: f_norm_eps       = 1.0e-05
llm_load_print_meta: f_norm_rms_eps   = 0.0e+00
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 10240
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 2048
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 3B
llm_load_print_meta: model ftype      = Q8_0
llm_load_print_meta: model params     = 2.78 B
llm_load_print_meta: model size       = 2.75 GiB (8.51 BPW) 
llm_load_print_meta: general.name     = Phi2
llm_load_print_meta: BOS token        = 50256 '<|endoftext|>'
llm_load_print_meta: EOS token        = 50256 '<|endoftext|>'
llm_load_print_meta: UNK token        = 50256 '<|endoftext|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_tensors: ggml ctx size =    0.25 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:            buffer size =  2686.46 MiB
llm_load_tensors:        CPU buffer size =   132.81 MiB
............................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:            KV buffer size =   160.00 MiB
llama_new_context_with_model: KV self size  =  160.00 MiB, K (f16):   80.00 MiB, V (f16):   80.00 MiB
llama_new_context_with_model:        CPU input buffer size   =     6.01 MiB
llama_new_context_with_model:            compute buffer size =   121.00 MiB
llama_new_context_with_model:        CPU compute buffer size =     5.50 MiB
llama_new_context_with_model: graph splits (measure): 3
GGML_SYCL_DEBUG=1
call ggml_sycl_norm
The program was built for 1 devices
Build program log for '12th Gen Intel(R) Core(TM) i7-1280P':
Compilation started
Compilation done
Linking started
Linking done
Device build started
Options used by backend compiler: 
Failed to build device program
CompilerException Failed to lookup symbol _ZTSZZL13norm_f32_syclPKfPfiifPN4sycl3_V15queueEENKUlRNS3_7handlerEE0_clES7_EUlNS3_7nd_itemILi3EEEE_
JIT session error: Symbols not found: [ _Z11fmax_commonDv32_fS_S_ ]
Failed to materialize symbols: { (main, { _ZTSZL17sum_rows_f32_syclPKfPfiiPN4sycl3_V15queueEEUlNS3_7nd_itemILi3EEEE_, _ZGVdN32uuuuuuu__ZTSZZL17soft_max_f32_syclPKfS0_PfiiifPN4sycl3_V15queueEENKUlRNS3_7handlerEE_clES7_EUlNS3_7nd_itemILi3EEEE_, _ZTSZZL13norm_f32_syclPKfPfiifPN4sycl3_V15queueEENKUlRNS3_7handlerEE_clES7_EUlNS3_7nd_itemILi3EEEE_, _ZTSZZL13norm_f32_syclPKfPfiifPN4sycl3_V15queueEENKUlRNS3_7handlerEE0_clES7_EUlNS3_7nd_itemILi3EEEE_, _ZGVdN32uuuuuu__ZTSZZL17rms_norm_f32_syclPKfPfiifPN4sycl3_V15queueEENKUlRNS3_7handlerEE0_clES7_EUlNS3_7nd_itemILi3EEEE_, _ZTSZZL19group_norm_f32_syclPKfPfiiiPN4sycl3_V15queueEENKUlRNS3_7handlerEE0_clES7_EUlNS3_7nd_itemILi3EEEE_, _ZTSZZL17rms_norm_f32_syclPKfPfiifPN4sycl3_V15queueEENKUlRNS3_7handlerEE0_clES7_EUlNS3_7nd_itemILi3EEEE_, _ZTSZZL17soft_max_f32_syclPKfS0_PfiiifPN4sycl3_V15queueEENKUlRNS3_7handlerEE_clES7_EUlNS3_7nd_itemILi3EEEE_, _ZTSZZL17rms_norm_f32_syclPKfPfiifPN4sycl3_V15queueEENKUlRNS3_7handlerEE_clES7_EUlNS3_7nd_itemILi3EEEE_, _ZTSZZL19group_norm_f32_syclPKfPfiiiPN4sycl3_V15queueEENKUlRNS3_7handlerEE_clES7_EUlNS3_7nd_itemILi3EEEE_, _ZGVdN32uuuuuu__ZTSZZL19group_norm_f32_syclPKfPfiiiPN4sycl3_V15queueEENKUlRNS3_7handlerEE0_clES7_EUlNS3_7nd_itemILi3EEEE_, _ZGVdN32uuuuuu__ZTSZZL13norm_f32_syclPKfPfiifPN4sycl3_V15queueEENKUlRNS3_7handlerEE0_clES7_EUlNS3_7nd_itemILi3EEEE_ }) }

 -11 (PI_ERROR_BUILD_PROGRAM_FAILURE)Exception caught at file:/home/mudler/_git/LocalAI/backend/cpp/llama/llama.cpp/ggml-sycl.cpp, line:12651

does that mean that the op is not supported by the onboard GPU? If so I'd be happy to add it in the known issues in the docs

airMeng · 2024-02-01T11:56:50Z

found 2 SYCL devices:
Device 0: 12th Gen Intel(R) Core(TM) i7-1280P, compute capability 3.0,
max compute_units 20, max work group size 8192, max sub group size 64, global mem size 67084083200
Device 1: Intel(R) FPGA Emulation Device, compute capability 1.2,
max compute_units 20, max work group size 67108864, max sub group size 64, global mem size 67084083200

@mudler seems there are some issues about your oneapi installation that no gpu device are detected. Can you run ./build/bin/ls-sycl-device and sycl-ls, then paste the output here?

for example, if the iGPU are detected, there will be separate iGPU(Intel(R) Graphics [0x7d55]) and CPU. And sycl backend can work on iGPU.

hengyume@9049fa09fde4:~$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Core(TM) Ultra 7 1003H OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Graphics [0x7d55] OpenCL 3.0 NEO  [23.43.27642.21]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x7d55] 1.3 [1.3.27642]

mudler · 2024-02-01T13:25:59Z

found 2 SYCL devices:
Device 0: 12th Gen Intel(R) Core(TM) i7-1280P, compute capability 3.0,
max compute_units 20, max work group size 8192, max sub group size 64, global mem size 67084083200
Device 1: Intel(R) FPGA Emulation Device, compute capability 1.2,
max compute_units 20, max work group size 67108864, max sub group size 64, global mem size 67084083200

@mudler seems there are some issues about your oneapi installation that no gpu device are detected. Can you run ./build/bin/ls-sycl-device and sycl-ls, then paste the output here?

for example, if the iGPU are detected, there will be separate iGPU(Intel(R) Graphics [0x7d55]) and CPU. And sycl backend can work on iGPU.
hengyume@9049fa09fde4:~$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Core(TM) Ultra 7 1003H OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Graphics [0x7d55] OpenCL 3.0 NEO  [23.43.27642.21]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x7d55] 1.3 [1.3.27642]

mmm alright I see, here I just have:

$ ./bin/ls-sycl-device 
found 2 SYCL devices:
  Device 0: 12th Gen Intel(R) Core(TM) i7-1280P,        compute capability 3.0,
        max compute_units 20,   max work group size 8192,       max sub group size 64,  global mem size 67084083200
  Device 1: Intel(R) FPGA Emulation Device,     compute capability 1.2,
        max compute_units 20,   max work group size 67108864,   max sub group size 64,  global mem size 67084083200
$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i7-1280P OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
$ clinfo -l                          
Platform #0: Intel(R) OpenCL
 `-- Device #0: 12th Gen Intel(R) Core(TM) i7-1280P
Platform #1: Intel(R) FPGA Emulation Platform for OpenCL(TM)
 `-- Device #0: Intel(R) FPGA Emulation Device
$ hwinfo --display
32: PCI 02.0: 0300 VGA compatible controller (VGA)              
  [Created at pci.386]
  Unique ID: _Znp.usB9nIk3U2E
  SysFS ID: /devices/pci0000:00/0000:00:02.0
  SysFS BusID: 0000:00:02.0
  Hardware Class: graphics card
  Model: "Intel VGA compatible controller"
  Vendor: pci 0x8086 "Intel Corporation"
  Device: pci 0x46a6 
  SubVendor: pci 0x1028 "Dell"
  SubDevice: pci 0x0b08 
  Revision: 0x0c
  Driver: "i915"
  Driver Modules: "i915"
  Memory Range: 0x6054000000-0x6054ffffff (rw,non-prefetchable)
  Memory Range: 0x4000000000-0x400fffffff (ro,non-prefetchable)
  I/O Ports: 0x3000-0x303f (rw)
  Memory Range: 0x000c0000-0x000dffff (rw,non-prefetchable,disabled)
  IRQ: 187 (44937037 events)
  Module Alias: "pci:v00008086d000046A6sv00001028sd00000B08bc03sc00i00"
  Driver Info #0:
    Driver Status: i915 is active
    Driver Activation Cmd: "modprobe i915"
  Config Status: cfg=new, avail=yes, need=no, active=unknown

Primary display adapter: #32

so maybe something is wrong my setup (even if I see all the drivers loaded 🙄 ), anyway thanks for double checking! maybe we can add a mention in the docs that there should be listed a gpu device (with the [opencl:gpu:..] prefix) and that's otherwise an error you can get by selecting the wrong one

airMeng · 2024-02-01T13:47:41Z

@mudler have you add yourself into video group as ?

llama.cpp/README-sycl.md

Line 64 in ce32060

b. Add user to group: video, render.

sudo usermod -aG render username
sudo usermod -aG video username

If the problem still exists, could you raise an issue then we can talk there instead of this closed PR?

* first update for migration * update init_cublas * add debug functio, commit all help code * step 1 * step 2 * step3 add fp16, slower 31->28 * add GGML_LIST_DEVICE function * step 5 format device and print * step6, enhance error check, remove CUDA macro, enhance device id to fix none-zero id issue * support main device is non-zero * step7 add debug for code path, rm log * step 8, rename all macro & func from cuda by sycl * fix error of select non-zero device, format device list * ren ggml-sycl.hpp -> ggml-sycl.h * clear CMAKE to rm unused lib and options * correct queue: rm dtct:get_queue * add print tensor function to debug * fix error: wrong result in 658746bb26702e50f2c59c0e4ada8e9da6010481 * summary dpct definition in one header file to replace folder:dpct * refactor device log * mv dpct definition from folder dpct to ggml-sycl.h * update readme, refactor build script * fix build with sycl * set nthread=1 when sycl, increase performance * add run script, comment debug code * add ls-sycl-device tool * add ls-sycl-device, rm unused files * rm rear space * dos2unix * Update README_sycl.md * fix return type * remove sycl version from include path * restore rm code to fix hang issue * add syc and link for sycl readme * rm original sycl code before refactor * fix code err * add know issue for pvc hang issue * enable SYCL_F16 support * align pr4766 * check for sycl blas, better performance * cleanup 1 * remove extra endif * add build&run script, clean CMakefile, update guide by review comments * rename macro to intel hardware * editor config format * format fixes * format fixes * editor format fix * Remove unused headers * skip build sycl tool for other code path * replace tab by space * fix blas matmul function * fix mac build * restore hip dependency * fix conflict * ren as review comments * mv internal function to .cpp file * export funciton print_sycl_devices(), mv class dpct definition to source file * update CI/action for sycl code, fix CI error of repeat/dup * fix action ID format issue * rm unused strategy * enable llama_f16 in ci * fix conflict * fix build break on MacOS, due to CI of MacOS depend on external ggml, instead of internal ggml * fix ci cases for unsupported data type * revert unrelated changed in cuda cmake remove useless nommq fix typo of GGML_USE_CLBLAS_SYCL * revert hip cmake changes * fix indent * add prefix in func name * revert no mmq * rm cpu blas duplicate * fix no_new_line * fix src1->type==F16 bug. * pass batch offset for F16 src1 * fix batch error * fix wrong code * revert sycl checking in test-sampling * pass void as arguments of ggml_backend_sycl_print_sycl_devices * remove extra blank line in test-sampling * revert setting n_threads in sycl * implement std::isinf for icpx with fast math. * Update ci/run.sh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/sycl/run-llama2.sh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/sycl/run-llama2.sh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update CMakeLists.txt Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update CMakeLists.txt Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update CMakeLists.txt Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update CMakeLists.txt Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add copyright and MIT license declare * update the cmd example --------- Co-authored-by: jianyuzh <jianyu.zhang@intel.com> Co-authored-by: luoyu-intel <yu.luo@intel.com> Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ElliottDyson · 2024-02-04T16:05:01Z

For better Arc support, is there a way we can have layers offload to the GPU in chunks? You see, intel have made it so that it doesn't allow for moving chunks greater than 4GB in size at any one time.

Jacoby1218 · 2024-02-04T22:08:11Z

For better Arc support, is there a way we can have layers offload to the GPU in chunks? You see, intel have made it so that it doesn't allow for moving chunks greater than 4GB in size at any one time.

this was already an issue, mentioned and fixed in #5250 and #5270, respectively.

* first update for migration * update init_cublas * add debug functio, commit all help code * step 1 * step 2 * step3 add fp16, slower 31->28 * add GGML_LIST_DEVICE function * step 5 format device and print * step6, enhance error check, remove CUDA macro, enhance device id to fix none-zero id issue * support main device is non-zero * step7 add debug for code path, rm log * step 8, rename all macro & func from cuda by sycl * fix error of select non-zero device, format device list * ren ggml-sycl.hpp -> ggml-sycl.h * clear CMAKE to rm unused lib and options * correct queue: rm dtct:get_queue * add print tensor function to debug * fix error: wrong result in 658746bb26702e50f2c59c0e4ada8e9da6010481 * summary dpct definition in one header file to replace folder:dpct * refactor device log * mv dpct definition from folder dpct to ggml-sycl.h * update readme, refactor build script * fix build with sycl * set nthread=1 when sycl, increase performance * add run script, comment debug code * add ls-sycl-device tool * add ls-sycl-device, rm unused files * rm rear space * dos2unix * Update README_sycl.md * fix return type * remove sycl version from include path * restore rm code to fix hang issue * add syc and link for sycl readme * rm original sycl code before refactor * fix code err * add know issue for pvc hang issue * enable SYCL_F16 support * align pr4766 * check for sycl blas, better performance * cleanup 1 * remove extra endif * add build&run script, clean CMakefile, update guide by review comments * rename macro to intel hardware * editor config format * format fixes * format fixes * editor format fix * Remove unused headers * skip build sycl tool for other code path * replace tab by space * fix blas matmul function * fix mac build * restore hip dependency * fix conflict * ren as review comments * mv internal function to .cpp file * export funciton print_sycl_devices(), mv class dpct definition to source file * update CI/action for sycl code, fix CI error of repeat/dup * fix action ID format issue * rm unused strategy * enable llama_f16 in ci * fix conflict * fix build break on MacOS, due to CI of MacOS depend on external ggml, instead of internal ggml * fix ci cases for unsupported data type * revert unrelated changed in cuda cmake remove useless nommq fix typo of GGML_USE_CLBLAS_SYCL * revert hip cmake changes * fix indent * add prefix in func name * revert no mmq * rm cpu blas duplicate * fix no_new_line * fix src1->type==F16 bug. * pass batch offset for F16 src1 * fix batch error * fix wrong code * revert sycl checking in test-sampling * pass void as arguments of ggml_backend_sycl_print_sycl_devices * remove extra blank line in test-sampling * revert setting n_threads in sycl * implement std::isinf for icpx with fast math. * Update ci/run.sh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/sycl/run-llama2.sh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/sycl/run-llama2.sh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update CMakeLists.txt Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update CMakeLists.txt Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update CMakeLists.txt Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update CMakeLists.txt Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add copyright and MIT license declare * update the cmd example --------- Co-authored-by: jianyuzh <jianyu.zhang@intel.com> Co-authored-by: luoyu-intel <yu.luo@intel.com> Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

abhilash1910 marked this pull request as draft August 21, 2023 16:19

shibe2 linked an issue Oct 20, 2023 that may be closed by this pull request

Adding Support for Intel GPUs with sycl #3695

Closed

itlackey mentioned this pull request Dec 3, 2023

feat: Add support Intel TabbyML/tabby#895

Closed

abhilash1910 marked this pull request as ready for review December 15, 2023 14:49

Jacoby1218 mentioned this pull request Dec 17, 2023

Support for CUDA/HIP on OpenCL/Level0 via chipStar #4504

Draft

Jacoby1218 mentioned this pull request Jan 5, 2024

Adding Native Support of SYCL for Intel GPUs #4749

Closed

AidanBeltonS reviewed Jan 22, 2024

View reviewed changes

common/common.cpp Outdated Show resolved Hide resolved

common/common.cpp Outdated Show resolved Hide resolved

common/common.cpp Outdated Show resolved Hide resolved

common/common.cpp Show resolved Hide resolved

AidanBeltonS reviewed Jan 22, 2024

View reviewed changes

llama.cpp Outdated Show resolved Hide resolved

Green-Sky reviewed Jan 22, 2024

View reviewed changes

README_sycl.md Outdated Show resolved Hide resolved

AidanBeltonS reviewed Jan 22, 2024

View reviewed changes

ggml-sycl.h Outdated Show resolved Hide resolved

AidanBeltonS reviewed Jan 22, 2024

View reviewed changes

ggml-sycl.cpp Outdated Show resolved Hide resolved

first update for migration

7a4343d

abhilash1910 and others added 3 commits January 26, 2024 20:49

Update CMakeLists.txt

5531754

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Update CMakeLists.txt

b9ffaab

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

add copyright and MIT license declare

2ab9715

ngxson mentioned this pull request Jan 27, 2024

Investigate: performance with Intel OneAPI (MKL) #5067

Closed

ngxson reviewed Jan 27, 2024

View reviewed changes

README_sycl.md Outdated Show resolved Hide resolved

update the cmd example

d394ca7

ggerganov merged commit 0f64857 into ggerganov:master Jan 28, 2024
46 checks passed

This was referenced Jan 31, 2024

feat(llama.cpp): Vulkan, Kompute, SYCL mudler/LocalAI#1647

Open

feat(sycl): Add support for Intel GPUs with sycl (#1647) mudler/LocalAI#1660

Merged

This was referenced Feb 4, 2024

Add support for Intel Arc GPUs ollama/ollama#1590

Open

Intel ARC Support oobabooga/text-generation-webui#1575

Closed

abhilash1910 mentioned this pull request Feb 13, 2024

Add SYCL logic in whisper ggerganov/whisper.cpp#1863

Merged

Feature: Integrate with unified SYCL backend for Intel GPUs #2690

Feature: Integrate with unified SYCL backend for Intel GPUs #2690

Conversation

abhilash1910 commented Aug 21, 2023 • edited Loading

ggerganov commented Oct 2, 2023

Jacoby1218 commented Oct 4, 2023 • edited Loading

abhilash1910 commented Oct 4, 2023

unbrice commented Oct 18, 2023

abhilash1910 commented Oct 18, 2023 • edited Loading

itlackey commented Dec 3, 2023

JohnnyOpcode commented Dec 3, 2023

ggerganov commented Dec 3, 2023

AlexFierro9 commented Dec 3, 2023

netrunnereve commented Dec 3, 2023

itlackey commented Dec 3, 2023

netrunnereve commented Dec 7, 2023 • edited Loading

abhilash1910 commented Dec 20, 2023

koech-v commented Jan 14, 2024

mgolub2 commented Jan 17, 2024 • edited Loading

AidanBeltonS left a comment

Choose a reason for hiding this comment

AidanBeltonS left a comment

Choose a reason for hiding this comment

Jacoby1218 commented Jan 22, 2024 • edited Loading

NeoZhangJianyu commented Jan 23, 2024

NeoZhangJianyu commented Jan 26, 2024

NeoZhangJianyu commented Jan 27, 2024

ggerganov commented Jan 27, 2024

ngxson commented Jan 27, 2024 • edited Loading

NeoZhangJianyu commented Jan 28, 2024 • edited Loading

sorasoras commented Jan 29, 2024

abhilash1910 commented Jan 29, 2024

NeoZhangJianyu commented Jan 29, 2024

sorasoras commented Jan 29, 2024

NeoZhangJianyu commented Jan 30, 2024

mudler commented Feb 1, 2024 • edited Loading

airMeng commented Feb 1, 2024

mudler commented Feb 1, 2024 • edited Loading

airMeng commented Feb 1, 2024

ElliottDyson commented Feb 4, 2024

Jacoby1218 commented Feb 4, 2024

abhilash1910 commented Aug 21, 2023 •

edited

Loading

Jacoby1218 commented Oct 4, 2023 •

edited

Loading

abhilash1910 commented Oct 18, 2023 •

edited

Loading

netrunnereve commented Dec 7, 2023 •

edited

Loading

mgolub2 commented Jan 17, 2024 •

edited

Loading

Jacoby1218 commented Jan 22, 2024 •

edited

Loading

ngxson commented Jan 27, 2024 •

edited

Loading

NeoZhangJianyu commented Jan 28, 2024 •

edited

Loading

mudler commented Feb 1, 2024 •

edited

Loading

mudler commented Feb 1, 2024 •

edited

Loading