Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support SYCL backend windows build #5208

Merged
merged 27 commits into from
Jan 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
838f8ea
support SYCL backend windows build
NeoZhangJianyu Jan 29, 2024
d05845a
add windows build in CI
NeoZhangJianyu Jan 29, 2024
fd02bdd
add for win build CI
NeoZhangJianyu Jan 29, 2024
455d17d
correct install oneMKL
NeoZhangJianyu Jan 29, 2024
21fbdd7
fix install issue
NeoZhangJianyu Jan 29, 2024
1e8c420
fix ci
NeoZhangJianyu Jan 29, 2024
9c6b646
fix install cmd
NeoZhangJianyu Jan 29, 2024
aef97b5
fix install cmd
NeoZhangJianyu Jan 29, 2024
7be0c36
fix install cmd
NeoZhangJianyu Jan 29, 2024
8f29f04
fix install cmd
NeoZhangJianyu Jan 29, 2024
76ddf85
fix install cmd
NeoZhangJianyu Jan 29, 2024
3a9480e
fix win build
NeoZhangJianyu Jan 30, 2024
68fd9f4
fix win build
NeoZhangJianyu Jan 30, 2024
05da43b
fix win build
NeoZhangJianyu Jan 30, 2024
07fb462
restore other CI part
NeoZhangJianyu Jan 30, 2024
61379dd
restore as base
NeoZhangJianyu Jan 30, 2024
ed62b08
rm no new line
NeoZhangJianyu Jan 30, 2024
cb9c35a
fix no new line issue, add -j
NeoZhangJianyu Jan 30, 2024
0ca32f7
fix grammer issue
NeoZhangJianyu Jan 30, 2024
3d46754
Merge branch 'master' into sycl_win_build
NeoZhangJianyu Jan 30, 2024
3e5b2eb
allow to trigger manually, fix format issue
NeoZhangJianyu Jan 30, 2024
2f1262f
Merge branch 'sycl_win_build' of https://github.com/NeoZhangJianyu/ll…
NeoZhangJianyu Jan 30, 2024
969f257
fix format
abhilash1910 Jan 30, 2024
5f1d91f
add newline
abhilash1910 Jan 30, 2024
a8f5822
fix format
abhilash1910 Jan 30, 2024
47cba0d
fix format
abhilash1910 Jan 30, 2024
b722926
fix format issuse
NeoZhangJianyu Jan 31, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -565,6 +565,31 @@ jobs:
path: |
cudart-llama-bin-win-cu${{ matrix.cuda }}-x64.zip

windows-latest-cmake-sycl:
runs-on: windows-latest
defaults:
run:
shell: bash

env:
WINDOWS_BASEKIT_URL: https://registrationcenter-download.intel.com/akdlm/IRC_NAS/62641e01-1e8d-4ace-91d6-ae03f7f8a71f/w_BaseKit_p_2024.0.0.49563_offline.exe
WINDOWS_DPCPP_MKL: intel.oneapi.win.cpp-dpcpp-common:intel.oneapi.win.mkl.devel


steps:
- name: Clone
id: checkout
uses: actions/checkout@v3
with:
fetch-depth: 0

- name: Install
run: scripts/install-oneapi.bat $WINDOWS_BASEKIT_URL $WINDOWS_DPCPP_MKL

- name: Build
id: cmake_build
run: examples/sycl/win-build-sycl.bat

ios-xcode-build:
runs-on: macos-latest

Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/editorconfig.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
name: EditorConfig Checker

on:
workflow_dispatch: # allows manual triggering
inputs:
create_release:
description: 'Create new release'
required: true
type: boolean
push:
branches:
- master
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -89,3 +89,4 @@ examples/jeopardy/results.txt

poetry.lock
poetry.toml
nppBackup
6 changes: 5 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -507,7 +507,11 @@ if (LLAMA_SYCL)
set(GGML_HEADERS_SYCL ggml.h ggml-sycl.h)
set(GGML_SOURCES_SYCL ggml-sycl.cpp)

set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} sycl OpenCL mkl_core pthread m dl mkl_sycl_blas mkl_intel_ilp64 mkl_tbb_thread)
if (WIN32)
set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} -fsycl sycl7 OpenCL mkl_sycl_blas_dll.lib mkl_intel_ilp64_dll.lib mkl_sequential_dll.lib mkl_core_dll.lib)
else()
set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} -fsycl OpenCL mkl_core pthread m dl mkl_sycl_blas mkl_intel_ilp64 mkl_tbb_thread)
endif()
endif()

if (LLAMA_KOMPUTE)
Expand Down
194 changes: 184 additions & 10 deletions README_sycl.md → README-sycl.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,14 @@

[Linux](#linux)

[Windows](#windows)

[Environment Variable](#environment-variable)

[Known Issue](#known-issue)

[Q&A](#q&a)

[Todo](#todo)

## Background
Expand All @@ -33,7 +37,7 @@ For Intel CPU, recommend to use llama.cpp for X86 (Intel MKL building).
|OS|Status|Verified|
|-|-|-|
|Linux|Support|Ubuntu 22.04|
|Windows|Ongoing| |
|Windows|Support|Windows 11|


## Intel GPU
Expand All @@ -42,7 +46,7 @@ For Intel CPU, recommend to use llama.cpp for X86 (Intel MKL building).
|-|-|-|
|Intel Data Center Max Series| Support| Max 1550|
|Intel Data Center Flex Series| Support| Flex 170|
|Intel Arc Series| Support| Arc 770|
|Intel Arc Series| Support| Arc 770, 730M|
|Intel built-in Arc GPU| Support| built-in Arc GPU in Meteor Lake|
|Intel iGPU| Support| iGPU in i5-1250P, i7-1165G7|

Expand Down Expand Up @@ -131,6 +135,7 @@ cmake .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
#build all binary
cmake --build . --config Release -v

cd ..
```

or
Expand Down Expand Up @@ -195,7 +200,7 @@ GGML_SYCL_DEVICE=0 ./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "Building
or run by script:

```
./examples/sycl/run_llama2.sh
./examples/sycl/run-llama2.sh
```

Note:
Expand All @@ -205,11 +210,175 @@ Note:

5. Check the device ID in output

Like
Like:
```
Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
```

## Windows

### Setup Environment

1. Install Intel GPU driver.

Please install Intel GPU driver by official guide: [Install GPU Drivers](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/arc/software/drivers.html).

2. Install Intel® oneAPI Base toolkit.

a. Please follow the procedure in [Get the Intel® oneAPI Base Toolkit ](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html).

Recommend to install to default folder: **/opt/intel/oneapi**.

Following guide uses the default folder as example. If you use other folder, please modify the following guide info with your folder.

b. Enable oneAPI running environment:

- In Search, input 'oneAPI'.

Search & open "Intel oneAPI command prompt for Intel 64 for Visual Studio 2022"

- In Run:

In CMD:
```
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64
```

c. Check GPU

In oneAPI command line:

```
sycl-ls
```

There should be one or more level-zero devices. Like **[ext_oneapi_level_zero:gpu:0]**.

Output (example):
```
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
[opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO [31.0.101.5186]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Iris(R) Xe Graphics 1.3 [1.3.28044]

```

3. Install cmake & make

a. Download & install cmake for windows: https://cmake.org/download/

b. Download & install make for windows provided by mingw-w64: https://www.mingw-w64.org/downloads/


### Build locally:

In oneAPI command line window:

```
mkdir -p build
cd build
@call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64 --force

:: for FP16
:: faster for long-prompt inference
:: cmake -G "MinGW Makefiles" .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icx -DCMAKE_BUILD_TYPE=Release -DLLAMA_SYCL_F16=ON

:: for FP32
cmake -G "MinGW Makefiles" .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icx -DCMAKE_BUILD_TYPE=Release


:: build example/main only
:: make main

:: build all binary
make -j
cd ..
```

or

```
.\examples\sycl\win-build-sycl.bat
```

Note:

- By default, it will build for all binary files. It will take more time. To reduce the time, we recommend to build for **example/main** only.

### Run

1. Put model file to folder **models**

2. Enable oneAPI running environment

- In Search, input 'oneAPI'.

Search & open "Intel oneAPI command prompt for Intel 64 for Visual Studio 2022"

- In Run:

In CMD:
```
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64
```

3. List device ID

Run without parameter:

```
build\bin\ls-sycl-device.exe

or

build\bin\main.exe
```

Check the ID in startup log, like:

```
found 4 SYCL devices:
Device 0: Intel(R) Arc(TM) A770 Graphics, compute capability 1.3,
max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136
Device 1: Intel(R) FPGA Emulation Device, compute capability 1.2,
max compute_units 24, max work group size 67108864, max sub group size 64, global mem size 67065057280
Device 2: 13th Gen Intel(R) Core(TM) i7-13700K, compute capability 3.0,
max compute_units 24, max work group size 8192, max sub group size 64, global mem size 67065057280
Device 3: Intel(R) Arc(TM) A770 Graphics, compute capability 3.0,
max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136

```

|Attribute|Note|
|-|-|
|compute capability 1.3|Level-zero running time, recommended |
|compute capability 3.0|OpenCL running time, slower than level-zero in most cases|

4. Set device ID and execute llama.cpp

Set device ID = 0 by **set GGML_SYCL_DEVICE=0**

```
set GGML_SYCL_DEVICE=0
build\bin\main.exe -m models\llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e -ngl 33 -s 0
```
or run by script:

```
.\examples\sycl\win-run-llama2.bat
```

Note:

- By default, mmap is used to read model file. In some cases, it leads to the hang issue. Recommend to use parameter **--no-mmap** to disable mmap() to skip this issue.


5. Check the device ID in output

Like:
```
Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
```

## Environment Variable

Expand All @@ -220,7 +389,7 @@ Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
|LLAMA_SYCL|ON (mandatory)|Enable build with SYCL code path. <br>For FP32/FP16, LLAMA_SYCL=ON is mandatory.|
|LLAMA_SYCL_F16|ON (optional)|Enable FP16 build with SYCL code path. Faster for long-prompt inference. <br>For FP32, not set it.|
|CMAKE_C_COMPILER|icx|Use icx compiler for SYCL code path|
|CMAKE_CXX_COMPILER|icpx|use icpx for SYCL code path|
|CMAKE_CXX_COMPILER|icpx (Linux), icx (Windows)|use icpx/icx for SYCL code path|

#### Running

Expand All @@ -232,18 +401,23 @@ Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device

## Known Issue

- Hang during startup

llama.cpp use mmap as default way to read model file and copy to GPU. In some system, memcpy will be abnormal and block.

Solution: add **--no-mmap**.

## Q&A

- Error: `error while loading shared libraries: libsycl.so.7: cannot open shared object file: No such file or directory`.

Miss to enable oneAPI running environment.

Install oneAPI base toolkit and enable it by: `source /opt/intel/oneapi/setvars.sh`.

- In Windows, no result, not error.

- Hang during startup

llama.cpp use mmap as default way to read model file and copy to GPU. In some system, memcpy will be abnormal and block.

Solution: add **--no-mmap**.
Miss to enable oneAPI running environment.

## Todo

Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ Inference of [LLaMA](https://arxiv.org/abs/2302.13971) model in pure C/C++

### Hot topics

- ⚠️ Incoming backends: https://github.com/ggerganov/llama.cpp/discussions/5138
- [SYCL backend](README-sycl.md) is ready (1/28/2024), support Linux/Windows in Intel GPUs (iGPU, Arc/Flex/Max series)
- New SOTA quantized models, including pure 2-bits: https://huggingface.co/ikawrakow
- Collecting Apple Silicon performance stats:
- M-series: https://github.com/ggerganov/llama.cpp/discussions/4167
Expand Down Expand Up @@ -604,7 +606,7 @@ Building the program with BLAS support may lead to some performance improvements

llama.cpp based on SYCL is used to support Intel GPU (Data Center Max series, Flex series, Arc series, Built-in GPU and iGPU).

For detailed info, please refer to [llama.cpp for SYCL](README_sycl.md).
For detailed info, please refer to [llama.cpp for SYCL](README-sycl.md).


### Prepare Data & Run
Expand Down
23 changes: 23 additions & 0 deletions examples/sycl/win-build-sycl.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@

:: MIT license
:: Copyright (C) 2024 Intel Corporation
:: SPDX-License-Identifier: MIT

mkdir -p build
cd build
@call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64 --force

:: for FP16
:: faster for long-prompt inference
:: cmake -G "MinGW Makefiles" .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icx -DCMAKE_BUILD_TYPE=Release -DLLAMA_SYCL_F16=ON

:: for FP32
cmake -G "MinGW Makefiles" .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icx -DCMAKE_BUILD_TYPE=Release


:: build example/main only
:: make main

:: build all binary
make -j
cd ..
13 changes: 13 additions & 0 deletions examples/sycl/win-run-llama2.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
:: MIT license
:: Copyright (C) 2024 Intel Corporation
:: SPDX-License-Identifier: MIT

INPUT2="Building a website can be done in 10 simple steps:\nStep 1:"
@call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64 --force


set GGML_SYCL_DEVICE=0
rem set GGML_SYCL_DEBUG=1
.\build\bin\main.exe -m models\llama-2-7b.Q4_0.gguf -p %INPUT2% -n 400 -e -ngl 33 -s 0


Loading
Loading